48 research outputs found
Stereo Reconstruction using Induced Symmetry and 3D scene priors
Tese de doutoramento em Engenharia Electrotécnica e de
Computadores apresentada à Faculdade de Ciências e Tecnologia da Universidade de CoimbraRecuperar a geometria 3D a partir de dois vistas, conhecida como reconstrução
estéreo, é um dos tópicos mais antigos e mais investigado em visão por computador.
A computação de modelos 3D do ambiente é útil para uma grande número de
aplicações, desde a robótica, passando pela sua utilização do consumidor comum,
até a procedimentos médicos. O princípio para recuperar a estrutura 3D cena é
bastante simples, no entanto, existem algumas situações que complicam consideravelmente o processo de reconstrução. Objetos que contêm estruturas pouco
texturadas ou repetitivas, e superfícies com bastante inclinação ainda colocam em
dificuldade os algoritmos state-of-the-art.
Esta tese de doutoramento aborda estas questões e apresenta um novo framework
estéreo que é completamente diferente das abordagens convencionais. Propomos
a utilização de simetria em vez de foto-similaridade para avaliar a verosimilhança
de pontos em duas imagens distintas serem uma correspondência. O framework
é chamado SymStereo, e baseia-se no efeito de espelhagem que surge sempre
que uma imagem é mapeada para a outra câmera usando a homografia induzida
por um plano de corte virtual que intersecta a baseline. Experiências em estéreo
denso comprovam que as nossas funções de custo baseadas em simetria se comparam
favoravelmente com os custos baseados em foto-consistência de melhor
desempenho. Param além disso, investigamos a possibilidade de realizar Stereo-Rangefinding, que consiste em usar estéreo passivo para recuperar exclusivamente a profundidade ao longo de um plano de varrimento. Experiências abrangentes fornecem evidência de que estéreo baseada em simetria induzida é especialmente eficaz para esta finalidade.
Como segunda linha de investigação, propomos superar os problemas descritos
anteriormente usando informação a priori sobre o ambiente 3D, com o objectivo de
aumentar a robustez do processo de reconstrução. Para tal, apresentamos uma nova abordagem global para detectar pontos de desvanecimento e grupos de direcções de desvanecimento mutuamente ortogonais em ambientes Manhattan. Experiências quer em imagens sintéticas quer em imagens reais demonstram que os nossos algoritmos superaram os métodos state-of-the-art, mantendo a computação aceitável. Além disso, mostramos pela primeira vez resultados na detecção simultânea de múltiplas configurações de Manhattan. Esta informação a priori sobre a estrutura da cena é depois usada numa pipeline de reconstrução que gera modelos piecewise planares de ambientes urbanos a partir de duas vistas calibradas. A nossa
formulação combina SymStereo e o algoritmo de clustering PEARL [3], e alterna
entre um passo de otimização discreto, que funde hipóteses de superfícies planares
e descarta detecções com pouco suporte, e uma etapa de otimização contínua, que
refina as poses dos planos. Experiências com pares estéreo de ambientes interiores
e exteriores confirmam melhorias significativas sobre métodos state-of-the-art
relativamente a precisão e robustez.
Finalmente, e como terceira contribuição para melhorar a visão estéreo na
presença de superfícies inclinadas, estendemos o recente framework de agregação estéreo baseada em histogramas [4]. O algoritmo original utiliza janelas de suporte fronto-paralelas para a agregação de custo, o que leva a resultados imprecisos
na presença de superfícies com inclinação significativa. Nós abordamos o
problema considerando hipóteses de orientação discretas. Os resultados experimentais obtidos comprovam a eficácia do método, permitindo melhorar a precisção
de correspondência, preservando simultaneamente uma baixa complexidade computacional.Recovering the 3D geometry from two or more views, known as stereo reconstruction,
is one of the earliest and most investigated topics in computer vision. The
computation of 3D models of an environment is useful for a very large number of applications, ranging from robotics, consumer utilization to medical procedures. The principle to recover the 3D scene structure is quite simple, however, there are some issues that considerable complicate the reconstruction process. Objects containing complicated structures, including low and repetitive textures, and highly slanted surfaces still pose difficulties to state-of-the-art algorithms.
This PhD thesis tackles this issues and introduces a new stereo framework that
is completely different from conventional approaches. We propose to use symmetry
instead of photo-similarity for assessing the likelihood of two image locations
being a match. The framework is called SymStereo, and is based on the mirroring
effect that arises whenever one view is mapped into the other using the homography induced by a virtual cut plane that intersects the baseline. Extensive experiments in dense stereo show that our symmetry-based cost functions compare favorably against the best performing photo-similarity matching costs. In addition, we investigate the possibility of accomplishing Stereo-Rangefinding that consists in using
passive stereo to exclusively recover depth along a scan plane. Thorough experiments
provide evidence that Stereo from Induced Symmetry is specially well suited
for this purpose.
As a second research line, we propose to overcome the previous issues using
priors about the 3D scene for increasing the robustness of the reconstruction process.
For this purpose, we present a new global approach for detecting vanishing
points and groups of mutually orthogonal vanishing directions in man-made environments. Experiments in both synthetic and real images show that our algorithms
outperform the state-of-the-art methods while keeping computation tractable. In
addition, we show for the first time results in simultaneously detecting multiple
Manhattan-world configurations. This prior information about the scene structure
is then included in a reconstruction pipeline that generates piece-wise planar
models of man-made environments from two calibrated views. Our formulation
combines SymStereo and PEARL clustering [3], and alternates between a discrete
optimization step, that merges planar surface hypotheses and discards detections
with poor support, and a continuous optimization step, that refines the plane poses.
Experiments with both indoor and outdoor stereo pairs show significant improvements
over state-of-the-art methods with respect to accuracy and robustness.
Finally, and as a third contribution to improve stereo matching in the presence
of surface slant, we extend the recent framework of Histogram Aggregation
[4]. The original algorithm uses a fronto-parallel support window for cost aggregation, leading to inaccurate results in the presence of significant surface slant. We
address the problem by considering discrete orientation hypotheses. The experimental
results prove the effectiveness of the approach, which enables to improve
the matching accuracy while preserving a low computational complexity
Stereo Reconstruction using Induced Symmetry and 3D scene priors
Tese de doutoramento em Engenharia Electrotécnica e de
Computadores apresentada à Faculdade de Ciências e Tecnologia da Universidade de CoimbraRecuperar a geometria 3D a partir de dois vistas, conhecida como reconstrução
estéreo, é um dos tópicos mais antigos e mais investigado em visão por computador.
A computação de modelos 3D do ambiente é útil para uma grande número de
aplicações, desde a robótica, passando pela sua utilização do consumidor comum,
até a procedimentos médicos. O princípio para recuperar a estrutura 3D cena é
bastante simples, no entanto, existem algumas situações que complicam consideravelmente o processo de reconstrução. Objetos que contêm estruturas pouco
texturadas ou repetitivas, e superfícies com bastante inclinação ainda colocam em
dificuldade os algoritmos state-of-the-art.
Esta tese de doutoramento aborda estas questões e apresenta um novo framework
estéreo que é completamente diferente das abordagens convencionais. Propomos
a utilização de simetria em vez de foto-similaridade para avaliar a verosimilhança
de pontos em duas imagens distintas serem uma correspondência. O framework
é chamado SymStereo, e baseia-se no efeito de espelhagem que surge sempre
que uma imagem é mapeada para a outra câmera usando a homografia induzida
por um plano de corte virtual que intersecta a baseline. Experiências em estéreo
denso comprovam que as nossas funções de custo baseadas em simetria se comparam
favoravelmente com os custos baseados em foto-consistência de melhor
desempenho. Param além disso, investigamos a possibilidade de realizar Stereo-Rangefinding, que consiste em usar estéreo passivo para recuperar exclusivamente a profundidade ao longo de um plano de varrimento. Experiências abrangentes fornecem evidência de que estéreo baseada em simetria induzida é especialmente eficaz para esta finalidade.
Como segunda linha de investigação, propomos superar os problemas descritos
anteriormente usando informação a priori sobre o ambiente 3D, com o objectivo de
aumentar a robustez do processo de reconstrução. Para tal, apresentamos uma nova abordagem global para detectar pontos de desvanecimento e grupos de direcções de desvanecimento mutuamente ortogonais em ambientes Manhattan. Experiências quer em imagens sintéticas quer em imagens reais demonstram que os nossos algoritmos superaram os métodos state-of-the-art, mantendo a computação aceitável. Além disso, mostramos pela primeira vez resultados na detecção simultânea de múltiplas configurações de Manhattan. Esta informação a priori sobre a estrutura da cena é depois usada numa pipeline de reconstrução que gera modelos piecewise planares de ambientes urbanos a partir de duas vistas calibradas. A nossa
formulação combina SymStereo e o algoritmo de clustering PEARL [3], e alterna
entre um passo de otimização discreto, que funde hipóteses de superfícies planares
e descarta detecções com pouco suporte, e uma etapa de otimização contínua, que
refina as poses dos planos. Experiências com pares estéreo de ambientes interiores
e exteriores confirmam melhorias significativas sobre métodos state-of-the-art
relativamente a precisão e robustez.
Finalmente, e como terceira contribuição para melhorar a visão estéreo na
presença de superfícies inclinadas, estendemos o recente framework de agregação estéreo baseada em histogramas [4]. O algoritmo original utiliza janelas de suporte fronto-paralelas para a agregação de custo, o que leva a resultados imprecisos
na presença de superfícies com inclinação significativa. Nós abordamos o
problema considerando hipóteses de orientação discretas. Os resultados experimentais obtidos comprovam a eficácia do método, permitindo melhorar a precisção
de correspondência, preservando simultaneamente uma baixa complexidade computacional.Recovering the 3D geometry from two or more views, known as stereo reconstruction,
is one of the earliest and most investigated topics in computer vision. The
computation of 3D models of an environment is useful for a very large number of applications, ranging from robotics, consumer utilization to medical procedures. The principle to recover the 3D scene structure is quite simple, however, there are some issues that considerable complicate the reconstruction process. Objects containing complicated structures, including low and repetitive textures, and highly slanted surfaces still pose difficulties to state-of-the-art algorithms.
This PhD thesis tackles this issues and introduces a new stereo framework that
is completely different from conventional approaches. We propose to use symmetry
instead of photo-similarity for assessing the likelihood of two image locations
being a match. The framework is called SymStereo, and is based on the mirroring
effect that arises whenever one view is mapped into the other using the homography induced by a virtual cut plane that intersects the baseline. Extensive experiments in dense stereo show that our symmetry-based cost functions compare favorably against the best performing photo-similarity matching costs. In addition, we investigate the possibility of accomplishing Stereo-Rangefinding that consists in using
passive stereo to exclusively recover depth along a scan plane. Thorough experiments
provide evidence that Stereo from Induced Symmetry is specially well suited
for this purpose.
As a second research line, we propose to overcome the previous issues using
priors about the 3D scene for increasing the robustness of the reconstruction process.
For this purpose, we present a new global approach for detecting vanishing
points and groups of mutually orthogonal vanishing directions in man-made environments. Experiments in both synthetic and real images show that our algorithms
outperform the state-of-the-art methods while keeping computation tractable. In
addition, we show for the first time results in simultaneously detecting multiple
Manhattan-world configurations. This prior information about the scene structure
is then included in a reconstruction pipeline that generates piece-wise planar
models of man-made environments from two calibrated views. Our formulation
combines SymStereo and PEARL clustering [3], and alternates between a discrete
optimization step, that merges planar surface hypotheses and discards detections
with poor support, and a continuous optimization step, that refines the plane poses.
Experiments with both indoor and outdoor stereo pairs show significant improvements
over state-of-the-art methods with respect to accuracy and robustness.
Finally, and as a third contribution to improve stereo matching in the presence
of surface slant, we extend the recent framework of Histogram Aggregation
[4]. The original algorithm uses a fronto-parallel support window for cost aggregation, leading to inaccurate results in the presence of significant surface slant. We
address the problem by considering discrete orientation hypotheses. The experimental
results prove the effectiveness of the approach, which enables to improve
the matching accuracy while preserving a low computational complexity
Data sharing in secure multimedia wireless sensor networks
© 2016 IEEE. The use of Multimedia Wireless Sensor Networks (MWSNs) is becoming common nowadays with a rapid growth in communication facilities. Similar to any other WSNs, these networks face various challenges while providing security, trust and privacy for user data. Provisioning of the aforementioned services become an uphill task especially while dealing with real-time streaming data. These networks operates with resource-constrained sensor nodes for days, months and even years depending on the nature of an application. The resource-constrained nature of these networks makes it difficult for the nodes to tackle real-time data in mission-critical applications such as military surveillance, forest fire monitoring, health-care and industrial automation. For a secured MWSN, the transmission and processing of streaming data needs to be explored deeply. The conventional data authentication schemes are not suitable for MWSNs due to the limitations imposed on sensor nodes in terms of battery power, computation, available bandwidth and storage. In this paper, we propose a novel quality-driven clustering-based technique for authenticating streaming data in MWSNs. Nodes with maximum energy are selected as Cluster Heads (CHs). The CHs collect data from member nodes and forward it to the Base Station (BS), thus preventing member nodes with low energy from dying soon and increasing life span of the underlying network. The proposed approach not only authenticates the streaming data but also maintains the quality of transmitted data. The proposed data authentication scheme coupled with an Error Concealment technique provides an energy-efficient and distortion-free real-time data streaming. The proposed scheme is compared with an unsupervised resources scenario. The simulation results demonstrate better network lifetime along with 21.34 dB gain in Peak Signal-to-Noise Ratio (PSNR) of received video data streams
QUIS-CAMPI: Biometric Recognition in Surveillance Scenarios
The concerns about individuals security have justified the increasing number of surveillance
cameras deployed both in private and public spaces. However, contrary to popular belief,
these devices are in most cases used solely for recording, instead of feeding intelligent analysis
processes capable of extracting information about the observed individuals. Thus, even though
video surveillance has already proved to be essential for solving multiple crimes, obtaining relevant
details about the subjects that took part in a crime depends on the manual inspection
of recordings. As such, the current goal of the research community is the development of
automated surveillance systems capable of monitoring and identifying subjects in surveillance
scenarios. Accordingly, the main goal of this thesis is to improve the performance of biometric
recognition algorithms in data acquired from surveillance scenarios. In particular, we aim at
designing a visual surveillance system capable of acquiring biometric data at a distance (e.g.,
face, iris or gait) without requiring human intervention in the process, as well as devising biometric
recognition methods robust to the degradation factors resulting from the unconstrained
acquisition process.
Regarding the first goal, the analysis of the data acquired by typical surveillance systems
shows that large acquisition distances significantly decrease the resolution of biometric samples,
and thus their discriminability is not sufficient for recognition purposes. In the literature,
diverse works point out Pan Tilt Zoom (PTZ) cameras as the most practical way for acquiring
high-resolution imagery at a distance, particularly when using a master-slave configuration. In
the master-slave configuration, the video acquired by a typical surveillance camera is analyzed
for obtaining regions of interest (e.g., car, person) and these regions are subsequently imaged
at high-resolution by the PTZ camera. Several methods have already shown that this configuration
can be used for acquiring biometric data at a distance. Nevertheless, these methods
failed at providing effective solutions to the typical challenges of this strategy, restraining its
use in surveillance scenarios. Accordingly, this thesis proposes two methods to support the development
of a biometric data acquisition system based on the cooperation of a PTZ camera
with a typical surveillance camera. The first proposal is a camera calibration method capable
of accurately mapping the coordinates of the master camera to the pan/tilt angles of the PTZ
camera. The second proposal is a camera scheduling method for determining - in real-time -
the sequence of acquisitions that maximizes the number of different targets obtained, while
minimizing the cumulative transition time. In order to achieve the first goal of this thesis,
both methods were combined with state-of-the-art approaches of the human monitoring field
to develop a fully automated surveillance capable of acquiring biometric data at a distance and
without human cooperation, designated as QUIS-CAMPI system.
The QUIS-CAMPI system is the basis for pursuing the second goal of this thesis. The analysis
of the performance of the state-of-the-art biometric recognition approaches shows that these
approaches attain almost ideal recognition rates in unconstrained data. However, this performance
is incongruous with the recognition rates observed in surveillance scenarios. Taking into
account the drawbacks of current biometric datasets, this thesis introduces a novel dataset comprising
biometric samples (face images and gait videos) acquired by the QUIS-CAMPI system at a
distance ranging from 5 to 40 meters and without human intervention in the acquisition process.
This set allows to objectively assess the performance of state-of-the-art biometric recognition
methods in data that truly encompass the covariates of surveillance scenarios. As such, this set
was exploited for promoting the first international challenge on biometric recognition in the wild. This thesis describes the evaluation protocols adopted, along with the results obtained
by the nine methods specially designed for this competition. In addition, the data acquired by
the QUIS-CAMPI system were crucial for accomplishing the second goal of this thesis, i.e., the
development of methods robust to the covariates of surveillance scenarios. The first proposal
regards a method for detecting corrupted features in biometric signatures inferred by a redundancy
analysis algorithm. The second proposal is a caricature-based face recognition approach
capable of enhancing the recognition performance by automatically generating a caricature
from a 2D photo. The experimental evaluation of these methods shows that both approaches
contribute to improve the recognition performance in unconstrained data.A crescente preocupação com a segurança dos indivíduos tem justificado o crescimento
do número de câmaras de vídeo-vigilância instaladas tanto em espaços privados como públicos.
Contudo, ao contrário do que normalmente se pensa, estes dispositivos são, na maior parte dos
casos, usados apenas para gravação, não estando ligados a nenhum tipo de software inteligente
capaz de inferir em tempo real informações sobre os indivíduos observados. Assim, apesar de a
vídeo-vigilância ter provado ser essencial na resolução de diversos crimes, o seu uso está ainda
confinado à disponibilização de vídeos que têm que ser manualmente inspecionados para extrair
informações relevantes dos sujeitos envolvidos no crime. Como tal, atualmente, o principal
desafio da comunidade científica é o desenvolvimento de sistemas automatizados capazes de
monitorizar e identificar indivíduos em ambientes de vídeo-vigilância.
Esta tese tem como principal objetivo estender a aplicabilidade dos sistemas de reconhecimento
biométrico aos ambientes de vídeo-vigilância. De forma mais especifica, pretende-se
1) conceber um sistema de vídeo-vigilância que consiga adquirir dados biométricos a longas distâncias
(e.g., imagens da cara, íris, ou vídeos do tipo de passo) sem requerer a cooperação dos
indivíduos no processo; e 2) desenvolver métodos de reconhecimento biométrico robustos aos
fatores de degradação inerentes aos dados adquiridos por este tipo de sistemas.
No que diz respeito ao primeiro objetivo, a análise aos dados adquiridos pelos sistemas típicos
de vídeo-vigilância mostra que, devido à distância de captura, os traços biométricos amostrados
não são suficientemente discriminativos para garantir taxas de reconhecimento aceitáveis.
Na literatura, vários trabalhos advogam o uso de câmaras Pan Tilt Zoom (PTZ) para adquirir
imagens de alta resolução à distância, principalmente o uso destes dispositivos no modo masterslave.
Na configuração master-slave um módulo de análise inteligente seleciona zonas de interesse
(e.g. carros, pessoas) a partir do vídeo adquirido por uma câmara de vídeo-vigilância
e a câmara PTZ é orientada para adquirir em alta resolução as regiões de interesse. Diversos
métodos já mostraram que esta configuração pode ser usada para adquirir dados biométricos
à distância, ainda assim estes não foram capazes de solucionar alguns problemas relacionados
com esta estratégia, impedindo assim o seu uso em ambientes de vídeo-vigilância. Deste modo,
esta tese propõe dois métodos para permitir a aquisição de dados biométricos em ambientes de
vídeo-vigilância usando uma câmara PTZ assistida por uma câmara típica de vídeo-vigilância. O
primeiro é um método de calibração capaz de mapear de forma exata as coordenadas da câmara
master para o ângulo da câmara PTZ (slave) sem o auxílio de outros dispositivos óticos. O
segundo método determina a ordem pela qual um conjunto de sujeitos vai ser observado pela
câmara PTZ. O método proposto consegue determinar em tempo-real a sequência de observações
que maximiza o número de diferentes sujeitos observados e simultaneamente minimiza o
tempo total de transição entre sujeitos. De modo a atingir o primeiro objetivo desta tese, os
dois métodos propostos foram combinados com os avanços alcançados na área da monitorização
de humanos para assim desenvolver o primeiro sistema de vídeo-vigilância completamente automatizado
e capaz de adquirir dados biométricos a longas distâncias sem requerer a cooperação
dos indivíduos no processo, designado por sistema QUIS-CAMPI.
O sistema QUIS-CAMPI representa o ponto de partida para iniciar a investigação relacionada
com o segundo objetivo desta tese. A análise do desempenho dos métodos de reconhecimento
biométrico do estado-da-arte mostra que estes conseguem obter taxas de reconhecimento
quase perfeitas em dados adquiridos sem restrições (e.g., taxas de reconhecimento
maiores do que 99% no conjunto de dados LFW). Contudo, este desempenho não é corroborado pelos resultados observados em ambientes de vídeo-vigilância, o que sugere que os conjuntos
de dados atuais não contêm verdadeiramente os fatores de degradação típicos dos ambientes de
vídeo-vigilância. Tendo em conta as vulnerabilidades dos conjuntos de dados biométricos atuais,
esta tese introduz um novo conjunto de dados biométricos (imagens da face e vídeos do tipo de
passo) adquiridos pelo sistema QUIS-CAMPI a uma distância máxima de 40m e sem a cooperação
dos sujeitos no processo de aquisição. Este conjunto permite avaliar de forma objetiva o desempenho
dos métodos do estado-da-arte no reconhecimento de indivíduos em imagens/vídeos
capturados num ambiente real de vídeo-vigilância. Como tal, este conjunto foi utilizado para
promover a primeira competição de reconhecimento biométrico em ambientes não controlados.
Esta tese descreve os protocolos de avaliação usados, assim como os resultados obtidos por 9
métodos especialmente desenhados para esta competição. Para além disso, os dados adquiridos
pelo sistema QUIS-CAMPI foram essenciais para o desenvolvimento de dois métodos para
aumentar a robustez aos fatores de degradação observados em ambientes de vídeo-vigilância. O
primeiro é um método para detetar características corruptas em assinaturas biométricas através
da análise da redundância entre subconjuntos de características. O segundo é um método de
reconhecimento facial baseado em caricaturas automaticamente geradas a partir de uma única
foto do sujeito. As experiências realizadas mostram que ambos os métodos conseguem reduzir
as taxas de erro em dados adquiridos de forma não controlada
Distributed Robotic Vision for Calibration, Localisation, and Mapping
This dissertation explores distributed algorithms for calibration, localisation, and mapping in the context of a multi-robot network equipped with cameras and onboard processing, comparing against centralised alternatives where all data is transmitted to a singular external node on which processing occurs. With the rise of large-scale camera networks, and as low-cost on-board processing becomes increasingly feasible in robotics networks, distributed algorithms are becoming important for robustness and scalability. Standard solutions to multi-camera computer vision require the data from all nodes to be processed at a central node which represents a significant single point of failure and incurs infeasible communication costs. Distributed solutions solve these issues by spreading the work over the entire network, operating only on local calculations and direct communication with nearby neighbours.
This research considers a framework for a distributed robotic vision platform for calibration, localisation, mapping tasks where three main stages are identified: an initialisation stage where calibration and localisation are performed in a distributed manner, a local tracking stage where visual odometry is performed without inter-robot communication, and a global mapping stage where global alignment and optimisation strategies are applied. In consideration of this framework, this research investigates how algorithms can be developed to produce fundamentally distributed solutions, designed to minimise computational complexity whilst maintaining excellent performance, and designed to operate effectively in the long term. Therefore, three primary objectives are sought aligning with these three stages
Message passing algorithms - methods and applications
Algorithms on graphs are used extensively in many applications and research areas. Such applications include machine learning, artificial intelligence, communications, image processing, state tracking, sensor networks, sensor fusion, distributed cooperative estimation, and distributed computation. Among the types of algorithms that employ some kind of message passing over the connections in a graph, the work in this dissertation will consider belief propagation and gossip consensus algorithms.
We begin by considering the marginalization problem on factor graphs, which is often solved or approximated with Sum-Product belief propagation (BP) over the edges of the factor graph. For the case of sensor networks, where the conservation of energy is of critical importance and communication overhead can quickly drain this valuable resource, we present techniques for specifically addressing the needs of this low power scenario. We create a number of alternatives to Sum-Product BP. The first of these is a generalization of Stochastic BP with reduced setup time. We then present Projected BP, where a subset of elements from each message is transmitted between nodes, and computational savings are realized in proportion to the reduction in size of the transmitted messages. Zoom BP is a derivative of Projected BP that focuses particularly on utilizing low bandwidth discrete channels. We give the results of experiments that show the practical advantages of our alternatives to Sum-Product BP.
We then proceed with an application of Sum-Product BP in sequential investment. We combine various insights from universal portfolios research in order to construct more sophisticated algorithms that take into account transaction costs. In particular, we use the insights of Blum and Kalai's transaction costs algorithm to take these costs into account in Cover and Ordentlich's side information portfolio and Kozat and Singer's switching portfolio. This involves carefully designing a set of causal portfolio strategies and computing a convex combination of these according to a carefully designed distribution. Universal (sublinear regret) performance bounds for each of these portfolios show that the algorithms asymptotically achieve the wealth of the best strategy from the corresponding portfolio strategy set, to first order in the exponent. The Sum-Product algorithm on factor graph representations of the universal investment algorithms provides computationally tractable approximations to the investment strategies. Finally, we present results of simulations of our algorithms and compare them to other portfolios.
We then turn our attention to gossip consensus and distributed estimation algorithms. Specifically, we consider the problem of estimating the parameters in a model of an agent's observations when it is known that the population as a whole is partitioned into a number of subpopulations, each of which has model parameters that are common among the member agents. We develop a method for determining the beneficial communication links in the network, which involves maintaining non-cooperative parameter estimates at each agent, and the distance of this estimate is compared with those of the neighbors to determine time-varying connectivity. We also study the expected squared estimation error of our algorithm, showing that estimates are asymptotically as good as centralized estimation, and we study the short term error convergence behavior.
Finally, we examine the metrics used to guide the design of data converters in the setting of digital communications. The usual analog to digital converters (ADC) performance metrics---effective number of bits (ENOB), total harmonic distortion (THD), signal to noise and distortion ratio (SNDR), and spurious free dynamic range (SFDR)---are all focused on the faithful reproduction of observed waveforms, which is not of fundamental concern if the data converter is to be used in a digital communications system. Therefore, we propose other information-centric rather than waveform-centric metrics that are better aligned with the goal of communications. We provide computational methods for calculating the values of these metrics, some of which are derived from Sum-Product BP or related algorithms. We also propose Statistics Gathering Converters (SGCs), which represent a change in perspective on data conversion for communications applications away from signal representation and towards the collection of relevant statistics for the purposes of decision making and detection. We show how to develop algorithms for the detection of transmitted data when the transmitted signal is received by an SGC. Finally, we provide evidence for the benefits of using system-level metrics and statistics gathering converters in communications applications
Spatial and temporal background modelling of non-stationary visual scenes
PhDThe prevalence of electronic imaging systems in everyday life has become increasingly apparent
in recent years. Applications are to be found in medical scanning, automated manufacture, and
perhaps most significantly, surveillance. Metropolitan areas, shopping malls, and road traffic
management all employ and benefit from an unprecedented quantity of video cameras for monitoring
purposes. But the high cost and limited effectiveness of employing humans as the final
link in the monitoring chain has driven scientists to seek solutions based on machine vision techniques.
Whilst the field of machine vision has enjoyed consistent rapid development in the last
20 years, some of the most fundamental issues still remain to be solved in a satisfactory manner.
Central to a great many vision applications is the concept of segmentation, and in particular,
most practical systems perform background subtraction as one of the first stages of video
processing. This involves separation of ‘interesting foreground’ from the less informative but
persistent background. But the definition of what is ‘interesting’ is somewhat subjective, and
liable to be application specific. Furthermore, the background may be interpreted as including
the visual appearance of normal activity of any agents present in the scene, human or otherwise.
Thus a background model might be called upon to absorb lighting changes, moving trees and
foliage, or normal traffic flow and pedestrian activity, in order to effect what might be termed in
‘biologically-inspired’ vision as pre-attentive selection. This challenge is one of the Holy Grails
of the computer vision field, and consequently the subject has received considerable attention.
This thesis sets out to address some of the limitations of contemporary methods of background
segmentation by investigating methods of inducing local mutual support amongst pixels
in three starkly contrasting paradigms: (1) locality in the spatial domain, (2) locality in the shortterm
time domain, and (3) locality in the domain of cyclic repetition frequency.
Conventional per pixel models, such as those based on Gaussian Mixture Models, offer no
spatial support between adjacent pixels at all. At the other extreme, eigenspace models impose
a structure in which every image pixel bears the same relation to every other pixel. But Markov
Random Fields permit definition of arbitrary local cliques by construction of a suitable graph, and
3
are used here to facilitate a novel structure capable of exploiting probabilistic local cooccurrence
of adjacent Local Binary Patterns. The result is a method exhibiting strong sensitivity to multiple
learned local pattern hypotheses, whilst relying solely on monochrome image data.
Many background models enforce temporal consistency constraints on a pixel in attempt to
confirm background membership before being accepted as part of the model, and typically some
control over this process is exercised by a learning rate parameter. But in busy scenes, a true
background pixel may be visible for a relatively small fraction of the time and in a temporally
fragmented fashion, thus hindering such background acquisition. However, support in terms of
temporal locality may still be achieved by using Combinatorial Optimization to derive shortterm
background estimates which induce a similar consistency, but are considerably more robust
to disturbance. A novel technique is presented here in which the short-term estimates act as
‘pre-filtered’ data from which a far more compact eigen-background may be constructed.
Many scenes entail elements exhibiting repetitive periodic behaviour. Some road junctions
employing traffic signals are among these, yet little is to be found amongst the literature regarding
the explicit modelling of such periodic processes in a scene. Previous work focussing on gait
recognition has demonstrated approaches based on recurrence of self-similarity by which local
periodicity may be identified. The present work harnesses and extends this method in order
to characterize scenes displaying multiple distinct periodicities by building a spatio-temporal
model. The model may then be used to highlight abnormality in scene activity. Furthermore, a
Phase Locked Loop technique with a novel phase detector is detailed, enabling such a model to
maintain correct synchronization with scene activity in spite of noise and drift of periodicity.
This thesis contends that these three approaches are all manifestations of the same broad
underlying concept: local support in each of the space, time and frequency domains, and furthermore,
that the support can be harnessed practically, as will be demonstrated experimentally
Exponential families on resource-constrained systems
This work is about the estimation of exponential family models on resource-constrained
systems. Our main goal is learning probabilistic models on devices with highly restricted
storage, arithmetic, and computational capabilities—so called, ultra-low-power
devices. Enhancing the learning capabilities of such devices opens up opportunities for
intelligent ubiquitous systems in all areas of life, from medicine, over robotics, to home
automation—to mention just a few. We investigate the inherent resource consumption of
exponential families, review existing techniques, and devise new methods to reduce the
resource consumption. The resource consumption, however, must not be reduced at all
cost. Exponential families possess several desirable properties that must be preserved:
Any probabilistic model encodes a conditional independence structure—our methods
keep this structure intact. Exponential family models are theoretically well-founded.
Instead of merely finding new algorithms based on intuition, our models are formalized
within the framework of exponential families and derived from first principles. We do
not introduce new assumptions which are incompatible with the formal derivation of the
base model, and our methods do not rely on properties of particular high-level applications.
To reduce the memory consumption, we combine and adapt reparametrization
and regularization in an innovative way that facilitates the sparse parametrization of
high-dimensional non-stationary time-series. The procedure allows us to load models in
memory constrained systems, which would otherwise not fit. We provide new theoretical
insights and prove that the uniform distance between the data generating process
and our reparametrized solution is bounded. To reduce the arithmetic complexity of
the learning problem, we derive the integer exponential family, based on the very definition
of sufficient statistics and maximum entropy estimation. New integer-valued
inference and learning algorithms are proposed, based on variational inference, proximal
optimization, and regularization. The benefit of this technique is larger, the weaker
the underlying system is, e.g., the probabilistic inference on a state-of-the-art ultra-lowpower
microcontroller can be accelerated by a factor of 250. While our integer inference
is fast, the underlying message passing relies on the variational principle, which is inexact
and has unbounded error on general graphs. Since exact inference and other existing
methods with bounded error exhibit exponential computational complexity, we employ
near minimax optimal polynomial approximations to yield new stochastic algorithms
for approximating the partition function and the marginal probabilities. Changing the
polynomial degree allows us to control the complexity and the error of our new stochastic
method. We provide an error bound that is parametrized by the number of samples, the
polynomial degree, and the norm of the model’s parameter vector. Moreover, important
intermediate quantities can be precomputed and shared with the weak computational device
to reduce the resource requirement of our method even further. All new techniques
are empirically evaluated on synthetic and real-world data, and the results confirm the
properties which are predicted by our theoretical derivation. Our novel techniques allow
a broader range of models to be learned on resource-constrained systems and imply
several new research possibilities