173 research outputs found

    Entropy Projection Curved Gabor with Random Forest and SVM for Face Recognition

    Get PDF
    In this work, we propose a workflow for face recognition under occlusion using the entropy projection from the curved Gabor filter, and create a representative and compact features vector that describes a face. Despite the reduced vector obtained by the entropy projection, it still presents opportunity for further dimensionality reduction. Therefore, we use a Random Forest classifier as an attribute selector, providing a 97% reduction of the original vector while keeping suitable accuracy. A set of experiments using three public image databases: AR Face, Extended Yale B with occlusion and FERET illustrates the proposed methodology, evaluated using the SVM classifier. The results obtained in the experiments show promising results when compared to the available approaches in the literature, obtaining 98.05% accuracy for the complete AR Face, 97.26% for FERET and 81.66% with Yale with 50% occlusion

    Multi-Modality Human Action Recognition

    Get PDF
    Human action recognition is very useful in many applications in various areas, e.g. video surveillance, HCI (Human computer interaction), video retrieval, gaming and security. Recently, human action recognition becomes an active research topic in computer vision and pattern recognition. A number of action recognition approaches have been proposed. However, most of the approaches are designed on the RGB images sequences, where the action data was collected by RGB/intensity camera. Thus the recognition performance is usually related to various occlusion, background, and lighting conditions of the image sequences. If more information can be provided along with the image sequences, more data sources other than the RGB video can be utilized, human actions could be better represented and recognized by the designed computer vision system.;In this dissertation, the multi-modality human action recognition is studied. On one hand, we introduce the study of multi-spectral action recognition, which involves the information from different spectrum beyond visible, e.g. infrared and near infrared. Action recognition in individual spectra is explored and new methods are proposed. Then the cross-spectral action recognition is also investigated and novel approaches are proposed in our work. On the other hand, since the depth imaging technology has made a significant progress recently, where depth information can be captured simultaneously with the RGB videos. The depth-based human action recognition is also investigated. I first propose a method combining different type of depth data to recognize human actions. Then a thorough evaluation is conducted on spatiotemporal interest point (STIP) based features for depth-based action recognition. Finally, I advocate the study of fusing different features for depth-based action analysis. Moreover, human depression recognition is studied by combining facial appearance model as well as facial dynamic model

    Medical Instrument Detection in 3D Ultrasound for Intervention Guidance

    Get PDF

    Medical Instrument Detection in 3D Ultrasound for Intervention Guidance

    Get PDF

    Biologically-inspired hierarchical architectures for object recognition

    Get PDF
    PhD ThesisThe existing methods for machine vision translate the three-dimensional objects in the real world into two-dimensional images. These methods have achieved acceptable performances in recognising objects. However, the recognition performance drops dramatically when objects are transformed, for instance, the background, orientation, position in the image, and scale. The human’s visual cortex has evolved to form an efficient invariant representation of objects from within a scene. The superior performance of human can be explained by the feed-forward multi-layer hierarchical structure of human visual cortex, in addition to, the utilisation of different fields of vision depending on the recognition task. Therefore, the research community investigated building systems that mimic the hierarchical architecture of the human visual cortex as an ultimate objective. The aim of this thesis can be summarised as developing hierarchical models of the visual processing that tackle the remaining challenges of object recognition. To enhance the existing models of object recognition and to overcome the above-mentioned issues, three major contributions are made that can be summarised as the followings 1. building a hierarchical model within an abstract architecture that achieves good performances in challenging image object datasets; 2. investigating the contribution for each region of vision for object and scene images in order to increase the recognition performance and decrease the size of the processed data; 3. further enhance the performance of all existing models of object recognition by introducing hierarchical topologies that utilise the context in which the object is found to determine the identity of the object. Statement ofHigher Committee For Education Development in Iraq (HCED

    Biometric Systems

    Get PDF
    Biometric authentication has been widely used for access control and security systems over the past few years. The purpose of this book is to provide the readers with life cycle of different biometric authentication systems from their design and development to qualification and final application. The major systems discussed in this book include fingerprint identification, face recognition, iris segmentation and classification, signature verification and other miscellaneous systems which describe management policies of biometrics, reliability measures, pressure based typing and signature verification, bio-chemical systems and behavioral characteristics. In summary, this book provides the students and the researchers with different approaches to develop biometric authentication systems and at the same time includes state-of-the-art approaches in their design and development. The approaches have been thoroughly tested on standard databases and in real world applications

    Improved terrain type classification using UAV downwash dynamic texture effect

    Get PDF
    The ability to autonomously navigate in an unknown, dynamic environment, while at the same time classifying various terrain types, are significant challenges still faced by the computer vision research community. Addressing these problems is of great interest for the development of collaborative autonomous navigation robots. For example, an Unmanned Aerial Vehicle (UAV) can be used to determine a path, while an Unmanned Surface Vehicle (USV) follows that path to reach the target destination. For the UAV to be able to determine if a path is valid or not, it must be able to identify the type of terrain it is flying over. With the help of its rotor air flow (known as downwash e↵ect), it becomes possible to extract advanced texture features, used for terrain type classification. This dissertation presents a complete analysis on the extraction of static and dynamic texture features, proposing various algorithms and analyzing their pros and cons. A UAV equipped with a single RGB camera was used to capture images and a Multilayer Neural Network was used for the automatic classification of water and non-water-type terrains by means of the downwash e↵ect created by the UAV rotors. The terrain type classification results are then merged into a georeferenced dynamic map, where it is possible to distinguish between water and non-water areas in real time. To improve the algorithms’ processing time, several sequential processes were con verted into parallel processes and executed in the UAV onboard GPU with the CUDA framework achieving speedups up to 10x. A comparison between the processing time of these two processing modes, sequential in the CPU and parallel in the GPU, is also presented in this dissertation. All the algorithms were developed using open-source libraries, and were analyzed and validated both via simulation and real environments. To evaluate the robustness of the proposed algorithms, the studied terrains were tested with and without the presence of the downwash e↵ect. It was concluded that the classifier could be improved by per forming combinations between static and dynamic features, achieving an accuracy higher than 99% in the classification of water and non-water terrain.Dotar equipamentos moveis da funcionalidade de navegação autónoma em ambientes desconhecidos e dinâmicos, ao mesmo tempo que, classificam terrenos do tipo água e não água, são desafios que se colocam atualmente a investigadores na área da visão computacional. As soluções para estes problemas são de grande interesse para a navegação autónoma e a colaboração entre robôs. Por exemplo, um veículo aéreo não tripulado (UAV) pode ser usado para determinar o caminho que um veículo terrestre não tripulado (USV) deve percorrer para alcançar o destino pretendido. Para o UAV conseguir determinar se o caminho é válido ou não, tem de ser capaz de identificar qual o tipo de terreno que está a sobrevoar. Com a ajuda do fluxo de ar gerado pelos motores (conhecido como efeito downwash), é possível extrair características de textura avançadas, que serão usadas para a classificação do tipo de terreno. Esta dissertação apresenta uma análise completa sobre extração de texturas estáticas e dinâmicas, propondo diversos algoritmos e analisando os seus prós e contras. Um UAV equipado com uma única câmera RGB foi usado para capturar as imagens. Para classi ficar automaticamente terrenos do tipo água e não água foi usada uma rede neuronal multicamada e recorreu-se ao efeito de downwash criado pelos motores do UAV. Os re sultados da classificação do tipo de terreno são depois colocados num mapa dinâmico georreferenciado, onde é possível distinguir, em tempo real, terrenos do tipo água e não água. De forma a melhorar o tempo de processamento dos algoritmos desenvolvidos, vários processos sequenciais foram convertidos em processos paralelos e executados na GPU a bordo do UAV, com a ajuda da framework CUDA, tornando o algoritmo até 10x mais rápido. Também são apresentadas nesta dissertação comparações entre o tempo de processamento destes dois modos de processamento, sequencial na CPU e paralelo na GPU. Todos os algoritmos foram desenvolvidos através de bibliotecas open-source, e foram analisados e validados, tanto através de ambientes de simulação como em ambientes reais. Para avaliar a robustez dos algoritmos propostos, os terrenos estudados foram testados com e sem a presença do efeito downwash. Concluiu-se que o classificador pode ser melhorado realizando combinações entre as características de textura estáticas e dinâmicas, alcançando uma precisão superior a 99% na classificação de terrenos do tipo água e não água

    Text-detection and -recognition from natural images

    Get PDF
    Text detection and recognition from images could have numerous functional applications for document analysis, such as assistance for visually impaired people; recognition of vehicle license plates; evaluation of articles containing tables, street signs, maps, and diagrams; keyword-based image exploration; document retrieval; recognition of parts within industrial automation; content-based extraction; object recognition; address block location; and text-based video indexing. This research exploited the advantages of artificial intelligence (AI) to detect and recognise text from natural images. Machine learning and deep learning were used to accomplish this task.In this research, we conducted an in-depth literature review on the current detection and recognition methods used by researchers to identify the existing challenges, wherein the differences in text resulting from disparity in alignment, style, size, and orientation combined with low image contrast and a complex background make automatic text extraction a considerably challenging and problematic task. Therefore, the state-of-the-art suggested approaches obtain low detection rates (often less than 80%) and recognition rates (often less than 60%). This has led to the development of new approaches. The aim of the study was to develop a robust text detection and recognition method from natural images with high accuracy and recall, which would be used as the target of the experiments. This method could detect all the text in the scene images, despite certain specific features associated with the text pattern. Furthermore, we aimed to find a solution to the two main problems concerning arbitrarily shaped text (horizontal, multi-oriented, and curved text) detection and recognition in a low-resolution scene and with various scales and of different sizes.In this research, we propose a methodology to handle the problem of text detection by using novel combination and selection features to deal with the classification algorithms of the text/non-text regions. The text-region candidates were extracted from the grey-scale images by using the MSER technique. A machine learning-based method was then applied to refine and validate the initial detection. The effectiveness of the features based on the aspect ratio, GLCM, LBP, and HOG descriptors was investigated. The text-region classifiers of MLP, SVM, and RF were trained using selections of these features and their combinations. The publicly available datasets ICDAR 2003 and ICDAR 2011 were used to evaluate the proposed method. This method achieved the state-of-the-art performance by using machine learning methodologies on both databases, and the improvements were significant in terms of Precision, Recall, and F-measure. The F-measure for ICDAR 2003 and ICDAR 2011 was 81% and 84%, respectively. The results showed that the use of a suitable feature combination and selection approach could significantly increase the accuracy of the algorithms.A new dataset has been proposed to fill the gap of character-level annotation and the availability of text in different orientations and of curved text. The proposed dataset was created particularly for deep learning methods which require a massive completed and varying range of training data. The proposed dataset includes 2,100 images annotated at the character and word levels to obtain 38,500 samples of English characters and 12,500 words. Furthermore, an augmentation tool has been proposed to support the proposed dataset. The missing of object detection augmentation tool encroach to proposed tool which has the ability to update the position of bounding boxes after applying transformations on images. This technique helps to increase the number of samples in the dataset and reduce the time of annotations where no annotation is required. The final part of the thesis presents a novel approach for text spotting, which is a new framework for an end-to-end character detection and recognition system designed using an improved SSD convolutional neural network, wherein layers are added to the SSD networks and the aspect ratio of the characters is considered because it is different from that of the other objects. Compared with the other methods considered, the proposed method could detect and recognise characters by training the end-to-end model completely. The performance of the proposed method was better on the proposed dataset; it was 90.34. Furthermore, the F-measure of the method’s accuracy on ICDAR 2015, ICDAR 2013, and SVT was 84.5, 91.9, and 54.8, respectively. On ICDAR13, the method achieved the second-best accuracy. The proposed method could spot text in arbitrarily shaped (horizontal, oriented, and curved) scene text.</div
    • …
    corecore