392 research outputs found
RGB-T salient object detection via fusing multi-level CNN features
RGB-induced salient object detection has recently witnessed substantial progress, which is attributed to the superior feature learning capability of deep convolutional neural networks (CNNs). However, such detections suffer from challenging scenarios characterized by cluttered backgrounds, low-light conditions and variations in illumination. Instead of improving RGB based saliency detection, this paper takes advantage of the complementary benefits of RGB and thermal infrared images. Specifically, we propose a novel end-to-end network for multi-modal salient object detection, which turns the challenge of RGB-T saliency detection to a CNN feature fusion problem. To this end, a backbone network (e.g., VGG-16) is first adopted to extract the coarse features from each RGB or thermal infrared image individually, and then several adjacent-depth feature combination (ADFC) modules are designed to extract multi-level refined features for each single-modal input image, considering that features captured at different depths differ in semantic information and visual details. Subsequently, a multi-branch group fusion (MGF) module is employed to capture the cross-modal features by fusing those features from ADFC modules for a RGB-T image pair at each level. Finally, a joint attention guided bi-directional message passing (JABMP) module undertakes the task of saliency prediction via integrating the multi-level fused features from MGF modules. Experimental results on several public RGB-T salient object detection datasets demonstrate the superiorities of our proposed algorithm over the state-of-the-art approaches, especially under challenging conditions, such as poor illumination, complex background and low contrast
Pattern Recognition
A wealth of advanced pattern recognition algorithms are emerging from the interdiscipline between technologies of effective visual features and the human-brain cognition process. Effective visual features are made possible through the rapid developments in appropriate sensor equipments, novel filter designs, and viable information processing architectures. While the understanding of human-brain cognition process broadens the way in which the computer can perform pattern recognition tasks. The present book is intended to collect representative researches around the globe focusing on low-level vision, filter design, features and image descriptors, data mining and analysis, and biologically inspired algorithms. The 27 chapters coved in this book disclose recent advances and new ideas in promoting the techniques, technology and applications of pattern recognition
Biologically motivated keypoint detection for RGB-D data
With the emerging interest in active vision, computer vision researchers have been increasingly
concerned with the mechanisms of attention. Therefore, several visual attention
computational models inspired by the human visual system, have been developed, aiming at
the detection of regions of interest in images.
This thesis is focused on selective visual attention, which provides a mechanism for the
brain to focus computational resources on an object at a time, guided by low-level image properties
(Bottom-Up attention). The task of recognizing objects in different locations is achieved
by focusing on different locations, one at a time. Given the computational requirements of the
models proposed, the research in this area has been mainly of theoretical interest. More recently,
psychologists, neurobiologists and engineers have developed cooperation's and this has
resulted in considerable benefits. The first objective of this doctoral work is to bring together
concepts and ideas from these different research areas, providing a study of the biological research
on human visual system and a discussion of the interdisciplinary knowledge in this area, as
well as the state-of-art on computational models of visual attention (bottom-up). Normally, the
visual attention is referred by engineers as saliency: when people fix their look in a particular
region of the image, that's because that region is salient. In this research work, saliency methods
are presented based on their classification (biological plausible, computational or hybrid)
and in a chronological order.
A few salient structures can be used for applications like object registration, retrieval or
data simplification, being possible to consider these few salient structures as keypoints when
aiming at performing object recognition. Generally, object recognition algorithms use a large
number of descriptors extracted in a dense set of points, which comes along with very high computational
cost, preventing real-time processing. To avoid the problem of the computational
complexity required, the features have to be extracted from a small set of points, usually called
keypoints. The use of keypoint-based detectors allows the reduction of the processing time and
the redundancy in the data. Local descriptors extracted from images have been extensively
reported in the computer vision literature. Since there is a large set of keypoint detectors, this
suggests the need of a comparative evaluation between them. In this way, we propose to do a
description of 2D and 3D keypoint detectors, 3D descriptors and an evaluation of existing 3D keypoint
detectors in a public available point cloud library with 3D real objects. The invariance of
the 3D keypoint detectors was evaluated according to rotations, scale changes and translations.
This evaluation reports the robustness of a particular detector for changes of point-of-view and
the criteria used are the absolute and the relative repeatability rate. In our experiments, the
method that achieved better repeatability rate was the ISS3D method.
The analysis of the human visual system and saliency maps detectors with biological inspiration
led to the idea of making an extension for a keypoint detector based on the color
information in the retina. Such proposal produced a 2D keypoint detector inspired by the behavior
of the early visual system. Our method is a color extension of the BIMP keypoint detector,
where we include both color and intensity channels of an image: color information is included
in a biological plausible way and multi-scale image features are combined into a single keypoints
map. This detector is compared against state-of-art detectors and found particularly
well-suited for tasks such as category and object recognition. The recognition process is performed
by comparing the extracted 3D descriptors in the locations indicated by the keypoints after mapping the 2D keypoints locations to the 3D space. The evaluation allowed us to obtain
the best pair keypoint detector/descriptor on a RGB-D object dataset. Using our keypoint detector
and the SHOTCOLOR descriptor a good category recognition rate and object recognition
rate were obtained, and it is with the PFHRGB descriptor that we obtain the best results.
A 3D recognition system involves the choice of keypoint detector and descriptor. A new
method for the detection of 3D keypoints on point clouds is presented and a benchmarking is
performed between each pair of 3D keypoint detector and 3D descriptor to evaluate their performance
on object and category recognition. These evaluations are done in a public database
of real 3D objects. Our keypoint detector is inspired by the behavior and neural architecture
of the primate visual system: the 3D keypoints are extracted based on a bottom-up 3D saliency
map, which is a map that encodes the saliency of objects in the visual environment. The saliency
map is determined by computing conspicuity maps (a combination across different modalities)
of the orientation, intensity and color information, in a bottom-up and in a purely stimulusdriven
manner. These three conspicuity maps are fused into a 3D saliency map and, finally, the
focus of attention (or "keypoint location") is sequentially directed to the most salient points in
this map. Inhibiting this location automatically allows the system to attend to the next most
salient location. The main conclusions are: with a similar average number of keypoints, our 3D
keypoint detector outperforms the other eight 3D keypoint detectors evaluated by achiving the
best result in 32 of the evaluated metrics in the category and object recognition experiments,
when the second best detector only obtained the best result in 8 of these metrics. The unique
drawback is the computational time, since BIK-BUS is slower than the other detectors. Given
that differences are big in terms of recognition performance, size and time requirements, the
selection of the keypoint detector and descriptor has to be matched to the desired task and we
give some directions to facilitate this choice. After proposing the 3D keypoint detector, the research focused on a robust detection and
tracking method for 3D objects by using keypoint information in a particle filter. This method
consists of three distinct steps: Segmentation, Tracking Initialization and Tracking. The segmentation
is made to remove all the background information, reducing the number of points for
further processing. In the initialization, we use a keypoint detector with biological inspiration.
The information of the object that we want to follow is given by the extracted keypoints. The
particle filter does the tracking of the keypoints, so with that we can predict where the keypoints
will be in the next frame. In a recognition system, one of the problems is the computational cost
of keypoint detectors with this we intend to solve this problem. The experiments with PFBIKTracking
method are done indoors in an office/home environment, where personal robots are
expected to operate. The Tracking Error evaluates the stability of the general tracking method.
We also quantitatively evaluate this method using a "Tracking Error". Our evaluation is done by
the computation of the keypoint and particle centroid. Comparing our system that the tracking
method which exists in the Point Cloud Library, we archive better results, with a much smaller
number of points and computational time. Our method is faster and more robust to occlusion
when compared to the OpenniTracker.Com o interesse emergente na visĂŁo ativa, os investigadores de visĂŁo computacional tĂŞm
estado cada vez mais preocupados com os mecanismos de atenção. Por isso, uma série de
modelos computacionais de atenção visual, inspirado no sistema visual humano, têm sido desenvolvidos.
Esses modelos têm como objetivo detetar regiões de interesse nas imagens.
Esta tese está focada na atenção visual seletiva, que fornece um mecanismo para que
o cérebro concentre os recursos computacionais num objeto de cada vez, guiado pelas propriedades
de baixo nĂvel da imagem (atenção Bottom-Up). A tarefa de reconhecimento de
objetos em diferentes locais é conseguida através da concentração em diferentes locais, um
de cada vez. Dados os requisitos computacionais dos modelos propostos, a investigação nesta
área tem sido principalmente de interesse teórico. Mais recentemente, psicólogos, neurobiólogos
e engenheiros desenvolveram cooperações e isso resultou em benefĂcios consideráveis. No
inĂcio deste trabalho, o objetivo Ă© reunir os conceitos e ideias a partir dessas diferentes áreas
de investigação. Desta forma, é fornecido o estudo sobre a investigação da biologia do sistema
visual humano e uma discussão sobre o conhecimento interdisciplinar da matéria, bem como
um estado de arte dos modelos computacionais de atenção visual (bottom-up). Normalmente,
a atenção visual é denominada pelos engenheiros como saliência, se as pessoas fixam o olhar
numa determinada região da imagem é porque esta região é saliente. Neste trabalho de investigação,
os métodos saliência são apresentados em função da sua classificação (biologicamente
plausĂvel, computacional ou hĂbrido) e numa ordem cronolĂłgica.
Algumas estruturas salientes podem ser usadas, em vez do objeto todo, em aplicações
tais como registo de objetos, recuperação ou simplificação de dados. É possĂvel considerar
estas poucas estruturas salientes como pontos-chave, com o objetivo de executar o reconhecimento
de objetos. De um modo geral, os algoritmos de reconhecimento de objetos utilizam um
grande nĂşmero de descritores extraĂdos num denso conjunto de pontos. Com isso, estes tĂŞm um
custo computacional muito elevado, impedindo que o processamento seja realizado em tempo
real. A fim de evitar o problema da complexidade computacional requerido, as caracterĂsticas
devem ser extraĂdas a partir de um pequeno conjunto de pontos, geralmente chamados pontoschave.
O uso de detetores de pontos-chave permite a redução do tempo de processamento e a
quantidade de redundância dos dados. Os descritores locais extraĂdos a partir das imagens tĂŞm
sido amplamente reportados na literatura de visĂŁo por computador. Uma vez que existe um
grande conjunto de detetores de pontos-chave, sugere a necessidade de uma avaliação comparativa
entre eles. Desta forma, propomos a fazer uma descrição dos detetores de pontos-chave
2D e 3D, dos descritores 3D e uma avaliação dos detetores de pontos-chave 3D existentes numa
biblioteca de pĂşblica disponĂvel e com objetos 3D reais. A invariância dos detetores de pontoschave
3D foi avaliada de acordo com variações nas rotações, mudanças de escala e translações.
Essa avaliação retrata a robustez de um determinado detetor no que diz respeito às mudanças
de ponto-de-vista e os critérios utilizados são as taxas de repetibilidade absoluta e relativa. Nas
experiências realizadas, o método que apresentou melhor taxa de repetibilidade foi o método
ISS3D.
Com a análise do sistema visual humano e dos detetores de mapas de saliência com inspiração
biolĂłgica, surgiu a ideia de se fazer uma extensĂŁo para um detetor de ponto-chave
com base na informação de cor na retina. A proposta produziu um detetor de ponto-chave 2D
inspirado pelo comportamento do sistema visual. O nosso método é uma extensão com base na cor do detetor de ponto-chave BIMP, onde se incluem os canais de cor e de intensidade de
uma imagem. A informação de cor Ă© incluĂda de forma biolĂłgica plausĂvel e as caracterĂsticas
multi-escala da imagem sĂŁo combinadas num Ăşnico mapas de pontos-chave. Este detetor
Ă© comparado com os detetores de estado-da-arte e Ă© particularmente adequado para tarefas
como o reconhecimento de categorias e de objetos. O processo de reconhecimento Ă© realizado
comparando os descritores 3D extraĂdos nos locais indicados pelos pontos-chave. Para isso, as
localizações do pontos-chave 2D tĂŞm de ser convertido para o espaço 3D. Isto foi possĂvel porque
o conjunto de dados usado contém a localização de cada ponto de no espaço 2D e 3D. A avaliação
permitiu-nos obter o melhor par detetor de ponto-chave/descritor num RGB-D object dataset.
Usando o nosso detetor de ponto-chave e o descritor SHOTCOLOR, obtemos uma noa taxa de
reconhecimento de categorias e para o reconhecimento de objetos Ă© com o descritor PFHRGB
que obtemos os melhores resultados.
Um sistema de reconhecimento 3D envolve a escolha de detetor de ponto-chave e descritor,
por isso é apresentado um novo método para a deteção de pontos-chave em nuvens de
pontos 3D e uma análise comparativa é realizada entre cada par de detetor de ponto-chave
3D e descritor 3D para avaliar o desempenho no reconhecimento de categorias e de objetos.
Estas avaliações são feitas numa base de dados pública de objetos 3D reais. O nosso detetor
de ponto-chave Ă© inspirado no comportamento e na arquitetura neural do sistema visual dos
primatas. Os pontos-chave 3D sĂŁo extraĂdas com base num mapa de saliĂŞncias 3D bottom-up,
ou seja, um mapa que codifica a saliĂŞncia dos objetos no ambiente visual. O mapa de saliĂŞncia
é determinada pelo cálculo dos mapas de conspicuidade (uma combinação entre diferentes
modalidades) da orientação, intensidade e informações de cor de forma bottom-up e puramente
orientada para o estĂmulo. Estes trĂŞs mapas de conspicuidade sĂŁo fundidos num mapa de saliĂŞncia
3D e, finalmente, o foco de atenção (ou "localização do ponto-chave") está sequencialmente
direcionado para os pontos mais salientes deste mapa. Inibir este local permite que o sistema
automaticamente orientado para próximo local mais saliente. As principais conclusões são: com
um número médio similar de pontos-chave, o nosso detetor de ponto-chave 3D supera os outros
oito detetores de pontos-chave 3D avaliados, obtendo o melhor resultado em 32 das métricas
avaliadas nas experiĂŞncias do reconhecimento das categorias e dos objetos, quando o segundo
melhor detetor obteve apenas o melhor resultado em 8 dessas métricas. A única desvantagem
Ă© o tempo computacional, uma vez que BIK-BUS Ă© mais lento do que os outros detetores. Dado
que existem grandes diferenças em termos de desempenho no reconhecimento, de tamanho
e de tempo, a seleção do detetor de ponto-chave e descritor tem de ser interligada com a
tarefa desejada e nós damos algumas orientações para facilitar esta escolha neste trabalho de
investigação.
Depois de propor um detetor de ponto-chave 3D, a investigação incidiu sobre um método
robusto de deteção e tracking de objetos 3D usando as informações dos pontos-chave num filtro
de partĂculas. Este mĂ©todo consiste em trĂŞs etapas distintas: Segmentação, Inicialização do
Tracking e Tracking. A segmentação é feita de modo a remover toda a informação de fundo,
a fim de reduzir o número de pontos para processamento futuro. Na inicialização, usamos um
detetor de ponto-chave com inspiração biológica. A informação do objeto que queremos seguir
Ă© dada pelos pontos-chave extraĂdos. O filtro de partĂculas faz o acompanhamento dos pontoschave,
de modo a se poder prever onde os pontos-chave estarĂŁo no prĂłximo frame. As experiĂŞncias
com método PFBIK-Tracking são feitas no interior, num ambiente de escritório/casa, onde
se espera que robôs pessoais possam operar. Também avaliado quantitativamente este método
utilizando um "Tracking Error". A avaliação passa pelo cálculo das centróides dos pontos-chave e
das partĂculas. Comparando o nosso sistema com o mĂ©todo de tracking que existe na biblioteca usada no desenvolvimento, nĂłs obtemos melhores resultados, com um nĂşmero muito menor de
pontos e custo computacional. O nosso método é mais rápido e mais robusto em termos de
oclusĂŁo, quando comparado com o OpenniTracker
Sea-Surface Object Detection Based on Electro-Optical Sensors: A Review
Sea-surface object detection is critical for navigation safety of autonomous ships. Electrooptical (EO) sensors, such as video cameras, complement radar on board in detecting small obstacle
sea-surface objects. Traditionally, researchers have used horizon detection, background subtraction, and
foreground segmentation techniques to detect sea-surface objects. Recently, deep learning-based object
detection technologies have been gradually applied to sea-surface object detection. This article demonstrates a comprehensive overview of sea-surface object-detection approaches where the advantages
and drawbacks of each technique are compared, covering four essential aspects: EO sensors and image
types, traditional object-detection methods, deep learning methods, and maritime datasets collection. In
particular, sea-surface object detections based on deep learning methods are thoroughly analyzed and
compared with highly influential public datasets introduced as benchmarks to verify the effectiveness of
these approaches. The arti
The Evolution of First Person Vision Methods: A Survey
The emergence of new wearable technologies such as action cameras and
smart-glasses has increased the interest of computer vision scientists in the
First Person perspective. Nowadays, this field is attracting attention and
investments of companies aiming to develop commercial devices with First Person
Vision recording capabilities. Due to this interest, an increasing demand of
methods to process these videos, possibly in real-time, is expected. Current
approaches present a particular combinations of different image features and
quantitative methods to accomplish specific objectives like object detection,
activity recognition, user machine interaction and so on. This paper summarizes
the evolution of the state of the art in First Person Vision video analysis
between 1997 and 2014, highlighting, among others, most commonly used features,
methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart
Glasses, Computer Vision, Video Analytics, Human-machine Interactio
Multi-target pig tracking algorithm based on joint probability data association and particle filter
In order to evaluate the health status of pigs in time, monitor accurately the disease dynamics of live pigs, and reduce the morbidity and mortality of pigs in the existing large-scale farming model, pig detection and tracking technology based on machine vision are used to monitor the behavior of pigs. However, it is challenging to efficiently detect and track pigs with noise caused by occlusion and interaction between targets. In view of the actual breeding conditions of pigs and the limitations of existing behavior monitoring technology of an individual pig, this study proposed a method that used color feature, target centroid and the minimum circumscribed rectangle length-width ratio as the features to build a multi-target tracking algorithm, which based on joint probability data association and particle filter. Experimental results show the proposed algorithm can quickly and accurately track pigs in the video, and it is able to cope with partial occlusions and recover the tracks after temporary loss
Recommended from our members
Video content analysis for automated detection and tracking of humans in CCTV surveillance applications
This thesis was submitted for the degree of Doctor of Philosophy and awarded by Brunel University.The problems of achieving high detection rate with low false alarm rate for human detection and tracking in video sequence, performance scalability, and improving response time are addressed in this thesis. The underlying causes are the effect of scene complexity, human-to-human interactions, scale changes, and scene background-human interactions. A two-stage processing solution, namely, human detection, and human tracking with two novel pattern classifiers is presented. Scale independent human detection is achieved by processing in the wavelet domain using square wavelet features. These features used to characterise human silhouettes at different scales are similar to rectangular features used in [Viola 2001]. At the detection stage two detectors are combined to improve detection rate. The first detector is based on shape-outline of humans extracted from the scene using a reduced complexity outline extraction algorithm. A Shape mismatch measure is used to differentiate between the human and the background class. The second detector uses rectangular features as primitives for silhouette description in the wavelet domain. The marginal distribution of features collocated at a particular position on a candidate human (a patch of the image) is used to describe statistically the silhouette. Two similarity measures are computed between a candidate human and the model histograms of human and non human classes. The similarity measure is used to discriminate between the human and the non human class. At the tracking stage, a tracker based on joint probabilistic data association filter (JPDAF) for data association, and motion correspondence is presented. Track clustering is used to reduce hypothesis enumeration complexity. Towards improving response time with increase in frame dimension, scene complexity, and number of channels; a scalable algorithmic architecture and operating accuracy prediction technique is presented. A scheduling strategy for improving the response time and throughput by parallel processing is also presented
Object Tracking with Adaptive Multicue Incremental Visual Tracker
Generally, subspace learning based methods such as the Incremental Visual Tracker (IVT) have been shown to be quite effective for visual tracking problem. However, it may fail to follow the target when it undergoes drastic pose or illumination changes. In this work, we present a novel tracker to enhance the IVT algorithm by employing a multicue based adaptive appearance model. First, we carry out the integration of cues both in feature space and in geometric space. Second, the integration directly depends on the dynamically-changing reliabilities of visual cues. These two aspects of our method allow the tracker to easily adapt itself to the changes in the context and accordingly improve the tracking accuracy by resolving the ambiguities. Experimental results demonstrate that subspace-based tracking is strongly improved by exploiting the multiple cues through the proposed algorithm
- …