173 research outputs found
Entropy Projection Curved Gabor with Random Forest and SVM for Face Recognition
In this work, we propose a workflow for face recognition under occlusion using the entropy projection from the curved Gabor filter, and create a representative and compact features vector that describes a face. Despite the reduced vector obtained by the entropy projection, it still presents opportunity for further dimensionality reduction. Therefore, we use a Random Forest classifier as an attribute selector, providing a 97% reduction of the original vector while keeping suitable accuracy. A set of experiments using three public image databases: AR Face, Extended Yale B with occlusion and FERET illustrates the proposed methodology, evaluated using the SVM classifier. The results obtained in the experiments show promising results when compared to the available approaches in the literature, obtaining 98.05% accuracy for the complete AR Face, 97.26% for FERET and 81.66% with Yale with 50% occlusion
Multi-Modality Human Action Recognition
Human action recognition is very useful in many applications in various areas, e.g. video surveillance, HCI (Human computer interaction), video retrieval, gaming and security. Recently, human action recognition becomes an active research topic in computer vision and pattern recognition. A number of action recognition approaches have been proposed. However, most of the approaches are designed on the RGB images sequences, where the action data was collected by RGB/intensity camera. Thus the recognition performance is usually related to various occlusion, background, and lighting conditions of the image sequences. If more information can be provided along with the image sequences, more data sources other than the RGB video can be utilized, human actions could be better represented and recognized by the designed computer vision system.;In this dissertation, the multi-modality human action recognition is studied. On one hand, we introduce the study of multi-spectral action recognition, which involves the information from different spectrum beyond visible, e.g. infrared and near infrared. Action recognition in individual spectra is explored and new methods are proposed. Then the cross-spectral action recognition is also investigated and novel approaches are proposed in our work. On the other hand, since the depth imaging technology has made a significant progress recently, where depth information can be captured simultaneously with the RGB videos. The depth-based human action recognition is also investigated. I first propose a method combining different type of depth data to recognize human actions. Then a thorough evaluation is conducted on spatiotemporal interest point (STIP) based features for depth-based action recognition. Finally, I advocate the study of fusing different features for depth-based action analysis. Moreover, human depression recognition is studied by combining facial appearance model as well as facial dynamic model
Biologically-inspired hierarchical architectures for object recognition
PhD ThesisThe existing methods for machine vision translate the three-dimensional
objects in the real world into two-dimensional images. These methods
have achieved acceptable performances in recognising objects. However,
the recognition performance drops dramatically when objects are transformed, for instance, the background, orientation, position in the image,
and scale. The human’s visual cortex has evolved to form an efficient
invariant representation of objects from within a scene. The superior
performance of human can be explained by the feed-forward multi-layer
hierarchical structure of human visual cortex, in addition to, the utilisation of different fields of vision depending on the recognition task.
Therefore, the research community investigated building systems that
mimic the hierarchical architecture of the human visual cortex as an
ultimate objective.
The aim of this thesis can be summarised as developing hierarchical
models of the visual processing that tackle the remaining challenges of
object recognition. To enhance the existing models of object recognition
and to overcome the above-mentioned issues, three major contributions
are made that can be summarised as the followings
1. building a hierarchical model within an abstract architecture that
achieves good performances in challenging image object datasets;
2. investigating the contribution for each region of vision for object
and scene images in order to increase the recognition performance
and decrease the size of the processed data;
3. further enhance the performance of all existing models of object
recognition by introducing hierarchical topologies that utilise the
context in which the object is found to determine the identity of
the object.
Statement ofHigher Committee For Education Development in Iraq (HCED
Biometric Systems
Biometric authentication has been widely used for access control and security systems over the past few years. The purpose of this book is to provide the readers with life cycle of different biometric authentication systems from their design and development to qualification and final application. The major systems discussed in this book include fingerprint identification, face recognition, iris segmentation and classification, signature verification and other miscellaneous systems which describe management policies of biometrics, reliability measures, pressure based typing and signature verification, bio-chemical systems and behavioral characteristics. In summary, this book provides the students and the researchers with different approaches to develop biometric authentication systems and at the same time includes state-of-the-art approaches in their design and development. The approaches have been thoroughly tested on standard databases and in real world applications
Improved terrain type classification using UAV downwash dynamic texture effect
The ability to autonomously navigate in an unknown, dynamic environment, while at
the same time classifying various terrain types, are significant challenges still faced by
the computer vision research community. Addressing these problems is of great interest
for the development of collaborative autonomous navigation robots. For example, an
Unmanned Aerial Vehicle (UAV) can be used to determine a path, while an Unmanned
Surface Vehicle (USV) follows that path to reach the target destination. For the UAV to be
able to determine if a path is valid or not, it must be able to identify the type of terrain it
is flying over. With the help of its rotor air flow (known as downwash e↵ect), it becomes
possible to extract advanced texture features, used for terrain type classification.
This dissertation presents a complete analysis on the extraction of static and dynamic
texture features, proposing various algorithms and analyzing their pros and cons. A
UAV equipped with a single RGB camera was used to capture images and a Multilayer
Neural Network was used for the automatic classification of water and non-water-type
terrains by means of the downwash e↵ect created by the UAV rotors. The terrain type
classification results are then merged into a georeferenced dynamic map, where it is
possible to distinguish between water and non-water areas in real time.
To improve the algorithms’ processing time, several sequential processes were con verted into parallel processes and executed in the UAV onboard GPU with the CUDA
framework achieving speedups up to 10x. A comparison between the processing time
of these two processing modes, sequential in the CPU and parallel in the GPU, is also
presented in this dissertation.
All the algorithms were developed using open-source libraries, and were analyzed
and validated both via simulation and real environments. To evaluate the robustness of
the proposed algorithms, the studied terrains were tested with and without the presence
of the downwash e↵ect. It was concluded that the classifier could be improved by per forming combinations between static and dynamic features, achieving an accuracy higher
than 99% in the classification of water and non-water terrain.Dotar equipamentos moveis da funcionalidade de navegação autónoma em ambientes
desconhecidos e dinâmicos, ao mesmo tempo que, classificam terrenos do tipo água e
não água, são desafios que se colocam atualmente a investigadores na área da visão computacional. As soluções para estes problemas são de grande interesse para a navegação
autónoma e a colaboração entre robôs. Por exemplo, um veÃculo aéreo não tripulado (UAV)
pode ser usado para determinar o caminho que um veÃculo terrestre não tripulado (USV)
deve percorrer para alcançar o destino pretendido. Para o UAV conseguir determinar se o
caminho é válido ou não, tem de ser capaz de identificar qual o tipo de terreno que está
a sobrevoar. Com a ajuda do fluxo de ar gerado pelos motores (conhecido como efeito
downwash), é possÃvel extrair caracterÃsticas de textura avançadas, que serão usadas para
a classificação do tipo de terreno.
Esta dissertação apresenta uma análise completa sobre extração de texturas estáticas
e dinâmicas, propondo diversos algoritmos e analisando os seus prós e contras. Um UAV
equipado com uma única câmera RGB foi usado para capturar as imagens. Para classi ficar automaticamente terrenos do tipo água e não água foi usada uma rede neuronal
multicamada e recorreu-se ao efeito de downwash criado pelos motores do UAV. Os re sultados da classificação do tipo de terreno são depois colocados num mapa dinâmico
georreferenciado, onde é possÃvel distinguir, em tempo real, terrenos do tipo água e não
água.
De forma a melhorar o tempo de processamento dos algoritmos desenvolvidos, vários processos sequenciais foram convertidos em processos paralelos e executados na
GPU a bordo do UAV, com a ajuda da framework CUDA, tornando o algoritmo até 10x
mais rápido. Também são apresentadas nesta dissertação comparações entre o tempo de
processamento destes dois modos de processamento, sequencial na CPU e paralelo na
GPU.
Todos os algoritmos foram desenvolvidos através de bibliotecas open-source, e foram
analisados e validados, tanto através de ambientes de simulação como em ambientes reais.
Para avaliar a robustez dos algoritmos propostos, os terrenos estudados foram testados
com e sem a presença do efeito downwash. Concluiu-se que o classificador pode ser melhorado realizando combinações entre as caracterÃsticas de textura estáticas e dinâmicas,
alcançando uma precisão superior a 99% na classificação de terrenos do tipo água e não água
Text-detection and -recognition from natural images
Text detection and recognition from images could have numerous functional applications for document analysis, such as assistance for visually impaired people; recognition of vehicle license plates; evaluation of articles containing tables, street signs, maps, and diagrams; keyword-based image exploration; document retrieval; recognition of parts within industrial automation; content-based extraction; object recognition; address block location; and text-based video indexing. This research exploited the advantages of artificial intelligence (AI) to detect and recognise text from natural images. Machine learning and deep learning were used to accomplish this task.In this research, we conducted an in-depth literature review on the current detection and recognition methods used by researchers to identify the existing challenges, wherein the differences in text resulting from disparity in alignment, style, size, and orientation combined with low image contrast and a complex background make automatic text extraction a considerably challenging and problematic task. Therefore, the state-of-the-art suggested approaches obtain low detection rates (often less than 80%) and recognition rates (often less than 60%). This has led to the development of new approaches. The aim of the study was to develop a robust text detection and recognition method from natural images with high accuracy and recall, which would be used as the target of the experiments. This method could detect all the text in the scene images, despite certain specific features associated with the text pattern. Furthermore, we aimed to find a solution to the two main problems concerning arbitrarily shaped text (horizontal, multi-oriented, and curved text) detection and recognition in a low-resolution scene and with various scales and of different sizes.In this research, we propose a methodology to handle the problem of text detection by using novel combination and selection features to deal with the classification algorithms of the text/non-text regions. The text-region candidates were extracted from the grey-scale images by using the MSER technique. A machine learning-based method was then applied to refine and validate the initial detection. The effectiveness of the features based on the aspect ratio, GLCM, LBP, and HOG descriptors was investigated. The text-region classifiers of MLP, SVM, and RF were trained using selections of these features and their combinations. The publicly available datasets ICDAR 2003 and ICDAR 2011 were used to evaluate the proposed method. This method achieved the state-of-the-art performance by using machine learning methodologies on both databases, and the improvements were significant in terms of Precision, Recall, and F-measure. The F-measure for ICDAR 2003 and ICDAR 2011 was 81% and 84%, respectively. The results showed that the use of a suitable feature combination and selection approach could significantly increase the accuracy of the algorithms.A new dataset has been proposed to fill the gap of character-level annotation and the availability of text in different orientations and of curved text. The proposed dataset was created particularly for deep learning methods which require a massive completed and varying range of training data. The proposed dataset includes 2,100 images annotated at the character and word levels to obtain 38,500 samples of English characters and 12,500 words. Furthermore, an augmentation tool has been proposed to support the proposed dataset. The missing of object detection augmentation tool encroach to proposed tool which has the ability to update the position of bounding boxes after applying transformations on images. This technique helps to increase the number of samples in the dataset and reduce the time of annotations where no annotation is required. The final part of the thesis presents a novel approach for text spotting, which is a new framework for an end-to-end character detection and recognition system designed using an improved SSD convolutional neural network, wherein layers are added to the SSD networks and the aspect ratio of the characters is considered because it is different from that of the other objects. Compared with the other methods considered, the proposed method could detect and recognise characters by training the end-to-end model completely. The performance of the proposed method was better on the proposed dataset; it was 90.34. Furthermore, the F-measure of the method’s accuracy on ICDAR 2015, ICDAR 2013, and SVT was 84.5, 91.9, and 54.8, respectively. On ICDAR13, the method achieved the second-best accuracy. The proposed method could spot text in arbitrarily shaped (horizontal, oriented, and curved) scene text.</div
- …