51 research outputs found

    Sensitivity analysis of AI-based algorithms for autonomous driving on optical wavefront aberrations induced by the windshield

    Get PDF
    Autonomous driving perception techniques are typically based on supervised machine learning models that are trained on real-world street data. A typical training process involves capturing images with a single car model and windshield configuration. However, deploying these trained models on different car types can lead to a domain shift, which can potentially hurt the neural networks performance and violate working ADAS requirements. To address this issue, this paper investigates the domain shift problem further by evaluating the sensitivity of two perception models to different windshield configurations. This is done by evaluating the dependencies between neural network benchmark metrics and optical merit functions by applying a Fourier optics based threat model. Our results show that there is a performance gap introduced by windshields and existing optical metrics used for posing requirements might not be sufficient

    An efficient framework for visible-infrared cross modality person re-identification

    Get PDF
    Visible-infrared cross-modality person re-identification (VI-ReId) is an essential task for video surveillance in poorly illuminated or dark environments. Despite many recent studies on person re-identification in the visible domain (ReId), there are few studies dealing specifically with VI-ReId. Besides challenges that are common for both ReId and VI-ReId such as pose/illumination variations, background clutter and occlusion, VI-ReId has additional challenges as color information is not available in infrared images. As a result, the performance of VI-ReId systems is typically lower than that of ReId systems. In this work, we propose a four-stream framework to improve VI-ReId performance. We train a separate deep convolutional neural network in each stream using different representations of input images. We expect that different and complementary features can be learned from each stream. In our framework, grayscale and infrared input images are used to train the ResNet in the first stream. In the second stream, RGB and three-channel infrared images (created by repeating the infrared channel) are used. In the remaining two streams, we use local pattern maps as input images. These maps are generated utilizing local Zernike moments transformation. Local pattern maps are obtained from grayscale and infrared images in the third stream and from RGB and three-channel infrared images in the last stream. We improve the performance of the proposed framework by employing a re-ranking algorithm for post-processing. Our results indicate that the proposed framework outperforms current state-of-the-art with a large margin by improving Rank-1/mAP by 29.79%/30.91% on SYSU-MM01 dataset, and by 9.73%/16.36% on RegDB dataset.WOS:000551127300017Scopus - Affiliation ID: 60105072Science Citation Index ExpandedQ2ArticleUluslararası işbirliği ile yapılmayan - HAYIREylül2020YÖK - 2020-2

    New human action recognition scheme with geometrical feature representation and invariant discretization for video surveillance

    Get PDF
    Human action recognition is an active research area in computer vision because of its immense application in the field of video surveillance, video retrieval, security systems, video indexing and human computer interaction. Action recognition is classified as the time varying feature data generated by human under different viewpoint that aims to build mapping between dynamic image information and semantic understanding. Although a great deal of progress has been made in recognition of human actions during last two decades, few proposed approaches in literature are reported. This leads to a need for much research works to be conducted in addressing on going challenges leading to developing more efficient approaches to solve human action recognition. Feature extraction is the main tasks in action recognition that represents the core of any action recognition procedure. The process of feature extraction involves transforming the input data that describe the shape of a segmented silhouette of a moving person into the set of represented features of action poses. In video surveillance, global moment invariant based on Geometrical Moment Invariant (GMI) is widely used in human action recognition. However, there are many drawbacks of GMI such that it lack of granular interpretation of the invariants relative to the shape. Consequently, the representation of features has not been standardized. Hence, this study proposes a new scheme of human action recognition (HAR) with geometrical moment invariants for feature extraction and supervised invariant discretization in identifying actions uniqueness in video sequencing. The proposed scheme is tested using IXMAS dataset in video sequence that has non rigid nature of human poses that resulting from drastic illumination changes, changing in pose and erratic motion patterns. The invarianceness of the proposed scheme is validated based on the intra-class and inter-class analysis. The result of the proposed scheme yields better performance in action recognition compared to the conventional scheme with an average of more than 99% accuracy while preserving the shape of the human actions in video images

    Human-Centric Machine Vision

    Get PDF
    Recently, the algorithms for the processing of the visual information have greatly evolved, providing efficient and effective solutions to cope with the variability and the complexity of real-world environments. These achievements yield to the development of Machine Vision systems that overcome the typical industrial applications, where the environments are controlled and the tasks are very specific, towards the use of innovative solutions to face with everyday needs of people. The Human-Centric Machine Vision can help to solve the problems raised by the needs of our society, e.g. security and safety, health care, medical imaging, and human machine interface. In such applications it is necessary to handle changing, unpredictable and complex situations, and to take care of the presence of humans

    Improving the Performance of the Space Surveillance Telescope as a Function of Seeing Parameter

    Get PDF
    This research paper investigates ways to improve the detection capability and predict the performance of the Space Surveillance Telescope (SST) system when it\u27s relocated to Exmouth, Australia. The dataset collected by the SST observing the Geosynchronous Earth Orbit (GEO) satellite, ANIK-F1, entering the earth\u27s eclipse is used to test the performance of the three existing and one new detection algorithm. The three existing algorithms are the point detection (Binary Hypothesis Test (BHT)), correlation detection (CD-BHT), and Multi-hypothesis Test using ten hypotheses (MHT10), and the new detection algorithm is the Multi-hypothesis Test using six hypotheses (MHT6). To improve the accuracy and validness of the comparison, a new method of obtaining the true atmospheric seeing parameter, terminator (point before the object entering the eclipse), and parameters used for the comparison are also investigated. It is found that the MHTs vastly outperform the BHTs, and the MHT6 offers a similar or improved performance over the MHT10, but requiring only half of the computing power

    Trade mark similarity assessment support system

    Get PDF
    Trade marks are valuable intangible intellectual property (IP) assets with potentially high reputational value that can be protected. Similarity between trade marks may potentially lead to infringement. That similarity is normally assessed based on the visual, conceptual and phonetic aspects of the trade marks in question. Hence, this thesis addresses this issue by proposing a trade mark similarity assessment support system that uses the three main aspects of trade mark similarity as a mechanism to avoid future infringement. A conceptual model of the proposed trade mark similarity assessment support system is first proposed and developed based on the similarity assessment criteria outlined in a trade mark manual. The proposed model is the first contribution of this study, and it consists of visual, conceptual, phonetic and inference engine modules. The second contribution of this work is an algorithm that compares trade marks based on their visual similarity. The algorithm performs a similarity assessment using content-based image retrieval (CBIR) technology and an integrated visual descriptor derived using the low-level image feature, i.e. the shape feature. The performance of the algorithm is then assessed using information retrieval based measures. The obtained result demonstrates better retrieval performance in comparison to the state of the art algorithm. The conceptual aspect of trade mark similarity is then examined and analysed using a proposed algorithm that employs semantic technology in the conceptual module. This contribution enables the computation of the conceptual similarity between trade marks, with the utilisation of an external knowledge source in the form of a lexical ontology, together with natural language processing and set similarity theory. The proposed algorithm is evaluated using both information VI retrieval and human collective opinion measures. The retrieval result produced by the proposed algorithm outperforms the traditional string similarity comparison algorithm in both measures. The phonetic module examines the phonetic similarity of trade marks using another proposed algorithm that utilises phoneme analysis. This algorithm employs phonological features, which are extracted based on human speech articulation. In addition, the algorithm also provides a mechanism to compare the phonetic aspect of trade marks with typographic characters. The proposed algorithm is the fourth contribution of this study. It is evaluated using an information retrieval based measure. The result shows better retrieval performance in comparison to the traditional string similarity algorithm. The final contribution of this study is a methodology to aggregate the overall similarity score between trade marks. It is motivated by the understanding that trade mark similarity should be assessed holistically; that is, the visual, conceptual and phonetic aspects should be considered together. The proposed method is developed in the inference engine module; it utilises fuzzy logic for the inference process. A set of fuzzy rules, which consists of several membership functions, is also derived in this study based on the trade mark manual and a collection of trade mark disputed cases is analysed. The method is then evaluated using both information retrieval and human collective opinion. The proposed method improves the retrieval accuracy and the experiment also proves that the aggregated similarity score correlates well with the score produced from human collective opinion. The evaluations performed in the course of this study employ the following datasets: the MPEG-7 shape dataset, the MPEG-7 trade marks dataset, a collection of 1400 trade marks from real trade mark dispute cases, and a collection of 378,943 company names

    Local and deep texture features for classification of natural and biomedical images

    Get PDF
    Developing efficient feature descriptors is very important in many computer vision applications including biomedical image analysis. In the past two decades and before the popularity of deep learning approaches in image classification, texture features proved to be very effective to capture the gradient variation in the image. Following the success of the Local Binary Pattern (LBP) descriptor, many variations of this descriptor were introduced to further improve the ability of obtaining good classification results. However, the problem of image classification gets more complicated when the number of images increases as well as the number of classes. In this case, more robust approaches must be used to address this problem. In this thesis, we address the problem of analyzing biomedical images by using a combination of local and deep features. First, we propose a novel descriptor that is based on the motif Peano scan concept called Joint Motif Labels (JML). After that, we combine the features extracted from the JML descriptor with two other descriptors called Rotation Invariant Co-occurrence among Local Binary Patterns (RIC-LBP) and Joint Adaptive Medina Binary Patterns (JAMBP). In addition, we construct another descriptor called Motif Patterns encoded by RIC-LBP and use it in our classification framework. We enrich the performance of our framework by combining these local descriptors with features extracted from a pre-trained deep network called VGG-19. Hence, the 4096 features of the Fully Connected 'fc7' layer are extracted and combined with the proposed local descriptors. Finally, we show that Random Forests (RF) classifier can be used to obtain superior performance in the field of biomedical image analysis. Testing was performed on two standard biomedical datasets and another three standard texture datasets. Results show that our framework can beat state-of-the-art accuracy on the biomedical image analysis and the combination of local features produce promising results on the standard texture datasets.Includes bibliographical reference

    Hand gesture recognition system based in computer vision and machine learning: Applications on human-machine interaction

    Get PDF
    Tese de Doutoramento em Engenharia de Eletrónica e de ComputadoresSendo uma forma natural de interação homem-máquina, o reconhecimento de gestos implica uma forte componente de investigação em áreas como a visão por computador e a aprendizagem computacional. O reconhecimento gestual é uma área com aplicações muito diversas, fornecendo aos utilizadores uma forma mais natural e mais simples de comunicar com sistemas baseados em computador, sem a necessidade de utilização de dispositivos extras. Assim, o objectivo principal da investigação na área de reconhecimento de gestos aplicada à interacção homemmáquina é o da criação de sistemas, que possam identificar gestos específicos e usálos para transmitir informações ou para controlar dispositivos. Para isso as interfaces baseados em visão para o reconhecimento de gestos, necessitam de detectar a mão de forma rápida e robusta e de serem capazes de efetuar o reconhecimento de gestos em tempo real. Hoje em dia, os sistemas de reconhecimento de gestos baseados em visão são capazes de trabalhar com soluções específicas, construídos para resolver um determinado problema e configurados para trabalhar de uma forma particular. Este projeto de investigação estudou e implementou soluções, suficientemente genéricas, com o recurso a algoritmos de aprendizagem computacional, permitindo a sua aplicação num conjunto alargado de sistemas de interface homem-máquina, para reconhecimento de gestos em tempo real. A solução proposta, Gesture Learning Module Architecture (GeLMA), permite de forma simples definir um conjunto de comandos que pode ser baseado em gestos estáticos e dinâmicos e que pode ser facilmente integrado e configurado para ser utilizado numa série de aplicações. É um sistema de baixo custo e fácil de treinar e usar, e uma vez que é construído unicamente com bibliotecas de código. As experiências realizadas permitiram mostrar que o sistema atingiu uma precisão de 99,2% em termos de reconhecimento de gestos estáticos e uma precisão média de 93,7% em termos de reconhecimento de gestos dinâmicos. Para validar a solução proposta, foram implementados dois sistemas completos. O primeiro é um sistema em tempo real capaz de ajudar um árbitro a arbitrar um jogo de futebol robótico. A solução proposta combina um sistema de reconhecimento de gestos baseada em visão com a definição de uma linguagem formal, o CommLang Referee, à qual demos a designação de Referee Command Language Interface System (ReCLIS). O sistema identifica os comandos baseados num conjunto de gestos estáticos e dinâmicos executados pelo árbitro, sendo este posteriormente enviado para um interface de computador que transmite a respectiva informação para os robôs. O segundo é um sistema em tempo real capaz de interpretar um subconjunto da Linguagem Gestual Portuguesa. As experiências demonstraram que o sistema foi capaz de reconhecer as vogais em tempo real de forma fiável. Embora a solução implementada apenas tenha sido treinada para reconhecer as cinco vogais, o sistema é facilmente extensível para reconhecer o resto do alfabeto. As experiências também permitiram mostrar que a base dos sistemas de interação baseados em visão pode ser a mesma para todas as aplicações e, deste modo facilitar a sua implementação. A solução proposta tem ainda a vantagem de ser suficientemente genérica e uma base sólida para o desenvolvimento de sistemas baseados em reconhecimento gestual que podem ser facilmente integrados com qualquer aplicação de interface homem-máquina. A linguagem formal de definição da interface pode ser redefinida e o sistema pode ser facilmente configurado e treinado com um conjunto de gestos diferentes de forma a serem integrados na solução final.Hand gesture recognition is a natural way of human computer interaction and an area of very active research in computer vision and machine learning. This is an area with many different possible applications, giving users a simpler and more natural way to communicate with robots/systems interfaces, without the need for extra devices. So, the primary goal of gesture recognition research applied to Human-Computer Interaction (HCI) is to create systems, which can identify specific human gestures and use them to convey information or controlling devices. For that, vision-based hand gesture interfaces require fast and extremely robust hand detection, and gesture recognition in real time. Nowadays, vision-based gesture recognition systems are able to work with specific solutions, built to solve one particular problem and configured to work in a particular manner. This research project studied and implemented solutions, generic enough, with the help of machine learning algorithms, allowing its application in a wide range of human-computer interfaces, for real-time gesture recognition. The proposed solution, Gesture Learning Module Architecture (GeLMA), allows the definition in a simple way of a set of commands that can be based on static and dynamic gestures and that can be easily integrated and configured to be used in a number of applications. It is easy to train and use, and since it is mainly built with open source libraries it is also an inexpensive solution. Experiments carried out showed that the system achieved an accuracy of 99.2% in terms of hand posture recognition and an average accuracy of 93,72% in terms of dynamic gesture recognition. To validate the proposed framework, two systems were implemented. The first one is an online system able to help a robotic soccer game referee judge a game in real time. The proposed solution combines a vision-based hand gesture recognition system with a formal language definition, the Referee CommLang, into what is called the Referee Command Language Interface System (ReCLIS). The system builds a command based on system-interpreted static and dynamic referee gestures, and is able to send it to a computer interface which can then transmit the proper commands to the robots. The second one is an online system able to interpret the Portuguese Sign Language. The experiments showed that the system was able to reliably recognize the vowels in real-time. Although the implemented solution was only trained to recognize the five vowels, it is easily extended to recognize the rest of the alphabet. These experiments also showed that the core of vision-based interaction systems can be the same for all applications and thus facilitate its implementation. The proposed framework has the advantage of being generic enough and a solid foundation for the development of hand gesture recognition systems that can be integrated in any human-computer interface application. The interface language can be redefined and the system can be easily configured to train different sets of gestures that can be easily integrated into the final solution

    Fast and robust image feature matching methods for computer vision applications

    Get PDF
    Service robotic systems are designed to solve tasks such as recognizing and manipulating objects, understanding natural scenes, navigating in dynamic and populated environments. It's immediately evident that such tasks cannot be modeled in all necessary details as easy as it is with industrial robot tasks; therefore, service robotic system has to have the ability to sense and interact with the surrounding physical environment through a multitude of sensors and actuators. Environment sensing is one of the core problems that limit the deployment of mobile service robots since existing sensing systems are either too slow or too expensive. Visual sensing is the most promising way to provide a cost effective solution to the mobile robot sensing problem. It's usually achieved using one or several digital cameras placed on the robot or distributed in its environment. Digital cameras are information rich sensors and are relatively inexpensive and can be used to solve a number of key problems for robotics and other autonomous intelligent systems, such as visual servoing, robot navigation, object recognition, pose estimation, and much more. The key challenges to taking advantage of this powerful and inexpensive sensor is to come up with algorithms that can reliably and quickly extract and match the useful visual information necessary to automatically interpret the environment in real-time. Although considerable research has been conducted in recent years on the development of algorithms for computer and robot vision problems, there are still open research challenges in the context of the reliability, accuracy and processing time. Scale Invariant Feature Transform (SIFT) is one of the most widely used methods that has recently attracted much attention in the computer vision community due to the fact that SIFT features are highly distinctive, and invariant to scale, rotation and illumination changes. In addition, SIFT features are relatively easy to extract and to match against a large database of local features. Generally, there are two main drawbacks of SIFT algorithm, the first drawback is that the computational complexity of the algorithm increases rapidly with the number of key-points, especially at the matching step due to the high dimensionality of the SIFT feature descriptor. The other one is that the SIFT features are not robust to large viewpoint changes. These drawbacks limit the reasonable use of SIFT algorithm for robot vision applications since they require often real-time performance and dealing with large viewpoint changes. This dissertation proposes three new approaches to address the constraints faced when using SIFT features for robot vision applications, Speeded up SIFT feature matching, robust SIFT feature matching and the inclusion of the closed loop control structure into object recognition and pose estimation systems. The proposed methods are implemented and tested on the FRIEND II/III service robotic system. The achieved results are valuable to adapt SIFT algorithm to the robot vision applications
    corecore