11 research outputs found

    Pattern Recognition

    Get PDF
    Pattern recognition is a very wide research field. It involves factors as diverse as sensors, feature extraction, pattern classification, decision fusion, applications and others. The signals processed are commonly one, two or three dimensional, the processing is done in real- time or takes hours and days, some systems look for one narrow object class, others search huge databases for entries with at least a small amount of similarity. No single person can claim expertise across the whole field, which develops rapidly, updates its paradigms and comprehends several philosophical approaches. This book reflects this diversity by presenting a selection of recent developments within the area of pattern recognition and related fields. It covers theoretical advances in classification and feature extraction as well as application-oriented works. Authors of these 25 works present and advocate recent achievements of their research related to the field of pattern recognition

    Visual Concept Detection in Images and Videos

    Get PDF
    The rapidly increasing proliferation of digital images and videos leads to a situation where content-based search in multimedia databases becomes more and more important. A prerequisite for effective image and video search is to analyze and index media content automatically. Current approaches in the field of image and video retrieval focus on semantic concepts serving as an intermediate description to bridge the “semantic gap” between the data representation and the human interpretation. Due to the large complexity and variability in the appearance of visual concepts, the detection of arbitrary concepts represents a very challenging task. In this thesis, the following aspects of visual concept detection systems are addressed: First, enhanced local descriptors for mid-level feature coding are presented. Based on the observation that scale-invariant feature transform (SIFT) descriptors with different spatial extents yield large performance differences, a novel concept detection system is proposed that combines feature representations for different spatial extents using multiple kernel learning (MKL). A multi-modal video concept detection system is presented that relies on Bag-of-Words representations for visual and in particular for audio features. Furthermore, a method for the SIFT-based integration of color information, called color moment SIFT, is introduced. Comparative experimental results demonstrate the superior performance of the proposed systems on the Mediamill and on the VOC Challenge. Second, an approach is presented that systematically utilizes results of object detectors. Novel object-based features are generated based on object detection results using different pooling strategies. For videos, detection results are assembled to object sequences and a shot-based confidence score as well as further features, such as position, frame coverage or movement, are computed for each object class. These features are used as additional input for the support vector machine (SVM)-based concept classifiers. Thus, other related concepts can also profit from object-based features. Extensive experiments on the Mediamill, VOC and TRECVid Challenge show significant improvements in terms of retrieval performance not only for the object classes, but also in particular for a large number of indirectly related concepts. Moreover, it has been demonstrated that a few object-based features are beneficial for a large number of concept classes. On the VOC Challenge, the additional use of object-based features led to a superior performance for the image classification task of 63.8% mean average precision (AP). Furthermore, the generalization capabilities of concept models are investigated. It is shown that different source and target domains lead to a severe loss in concept detection performance. In these cross-domain settings, object-based features achieve a significant performance improvement. Since it is inefficient to run a large number of single-class object detectors, it is additionally demonstrated how a concurrent multi-class object detection system can be constructed to speed up the detection of many object classes in images. Third, a novel, purely web-supervised learning approach for modeling heterogeneous concept classes in images is proposed. Tags and annotations of multimedia data in the WWW are rich sources of information that can be employed for learning visual concepts. The presented approach is aimed at continuous long-term learning of appearance models and improving these models periodically. For this purpose, several components have been developed: a crawling component, a multi-modal clustering component for spam detection and subclass identification, a novel learning component, called “random savanna”, a validation component, an updating component, and a scalability manager. Only a single word describing the visual concept is required to initiate the learning process. Experimental results demonstrate the capabilities of the individual components. Finally, a generic concept detection system is applied to support interdisciplinary research efforts in the field of psychology and media science. The psychological research question addressed in the field of behavioral sciences is, whether and how playing violent content in computer games may induce aggression. Therefore, novel semantic concepts most notably “violence” are detected in computer game videos to gain insights into the interrelationship of violent game events and the brain activity of a player. Experimental results demonstrate the excellent performance of the proposed automatic concept detection approach for such interdisciplinary research

    Video tolling integrated solution

    Get PDF
    Trabalho de projeto de mestrado, Engenharia Informática (Engenharia de Software) Universidade de Lisboa, Faculdade de Ciências, 2020A indústria de cobrança de portagens foi instituída no século VII com o intuito de financiar e auxiliar na manutenção de vias públicas através do pagamento de taxas correspondentes ao seu uso. Contudo, o advento do uso massificado de veículos automóveis, e consequente aumento do tráfego, obrigou à adaptação desta indústria aos tempos modernos, tendo sido introduzida uma filosofia de livre trânsito complementar à tradicional paragem para pagamento. A adoção deste tipo de medida foi possível graças ao desenvolvimento de tecnologias de reconhecimento ótico de caracteres, que permitem a identificação da matrícula, aliados ao uso de identificadores registados para cada veículo. Porém, a ausência de paragem implica também a existência de infrações de condutores que circulem com matrículas obscurecidas ou de difícil leitura. Deste modo, é desejável o uso de métodos complementares de auxílio à identificação dos veículos, caso do reconhecimento da marca e modelo dos mesmos (MMR). Os sistemas de reconhecimento ótico de caracteres com o objetivo de identificar matrículas são já implementados nas soluções concebidas pela Accenture para os seus diversos clientes na área, tornando estes novos métodos complementares numa adição interessante à robustez dos mesmos, de modo a reduzir custos adicionais relacionados com a identificação manual de matrículas através das imagens captadas. O presente trabalho visou então, em primeira instância, o estabelecimento de uma prova de conceito com um modelo arquitetural que permitisse a integração de um sistema de reconhecimento de marca e modelo de veículos com os sistemas informáticos previamente desenvolvidos e que se encontram atualmente em uso por parte dos clientes. Para este modelo foi também estabelecido um conjunto de requisitos, tanto funcionais como não funcionais, com o intuito de minorar, tanto quanto possível, perdas no desempenho e fiabilidade dos atuais sistemas por consequência da introdução deste novo componente de MMR. Os requisitos foram definidos fazendo uso de uma versão modificada do modelo de qualidade FURPS, segundo as boas práticas definidas pela equipa de desenvolvimento do Centro de Excelência de Tolling (TCoE) da Accenture Portugal. Adicionalmente, os requisitos definidos foram sujeitos ao estabelecimento de prioridades segundo as regras MoSCoW. A captura de imagens de veículos em movimento e consequente classificação oferece desafios inerentes à sua complexidade, pelo que foram também efetuadas considerações sobre os fatores de variabilidade que devem ser tidos em conta aquando da conceção de um sistema MMR. Estes fatores foram classificados segundo três áreas principais: propriedades inerentes ao sistema de captura de imagens (RSE), propriedades do evento de captura da imagem, e propriedades do veículo. A arquitetura proposta para um eventual sistema que possa ser passível de integração com os existentes faz uso da arquitetura dos mesmos, organizando-se em quatro camadas, a saber: acesso a dados (camada inferior), gestão e regras de negócio, avaliação de resultados e aumento da base de conhecimento disponível, e correspondência (camada superior). Para a elaboração da presente prova de conceito, foram deste modo escolhidas tecnologias que permitem a integração com os sistemas Java previamente existentes sem despender demasiado esforço adicional nessa integração. Deste modo, foram utilizadas bibliotecas Python para o uso de OpenCV, que permite o processamento de imagens, e Tensorflow para as atividades relacionadas com machine learning. O desenvolvimento da prova de conceito para estes sistemas envolveu também o teste de hipóteses quanto ao modo mais vantajoso de reconhecimento da marca e modelo dos veículos propriamente dita. Para este efeito, foram equacionadas três hipóteses, que se basearam no uso de dois datasets distintos. O primeiro conceito abordado consistiu em fingerprinting de imagens associadas a um dataset desenvolvido na Universidade de Stanford, contendo 16185 imagens de veículos automóveis ligeiros em variadas poses, que podem ser divididas segundo 49 marcas e 196 modelos distintos, se for considerada a distinção dos anos de comercialização dos mesmos. Para o efeito, foi usado o modelo de características AKAZE e testados três métodos distintos para efetuar as correspondências: força bruta com teste de rácio descrito na literatura (para dois rácios distintos, 0,4 e 0,7), força bruta com recurso a função de cross-check nativa das bibliotecas usadas, e FLANN. A pertença de uma imagem a determinada categoria foi então ditada pelo estabelecimento de correspondências entre os seus pontos-chave e os pontos-chave das imagens do dataset, testando vários algoritmos de ordenação para aumentar as probabilidades de correspondência com uma imagem pertencente à mesma classe. Os resultados obtidos demonstraram, no geral, precisões relativamente baixas, sendo que nenhuma ultrapassou os 20% para o reconhecimento da marca ou modelo dos veículos. Contudo, dos ensaios efetuados, dois destacaram-se ao conseguir atingir 16,8% de precisão para a marca e 11,2% para o modelo. Estes ensaios tiveram, de resto, características em comum, sendo que, em ambos os casos, foi utilizado o método de força bruta com rácio de 0,4. Os métodos de ordenação de resultados foram, todavia, diferentes, sendo que num dos casos foi usado o valor máximo de pontos-chave em comum (MV) e no segundo um rácio entre este número de pontos em comum e o número de pontos-chave existentes (MR). De entre ambos, o ensaio que recorreu ao método MR foi considerado estatisticamente mais significativo, dado possuir um valor do coeficiente de correlação k de Cohen mais elevado em relação a MV. Os parcos resultados obtidos através deste método levaram à tentativa de adoção de uma abordagem diferente, nomeadamente no que tocava à seleção das imagens que deviam ser comparadas, uma vez que os fatores de variabilidade identificados na análise se encontravam demasiado presentes nas imagens do dataset de Stanford. Deste modo, a grelha do veículo foi identificada como região de interesse (ROI), dados os padrões distintivos inerentes à mesma e a presença do logotipo identificador da marca à qual pertence o veículo. O objetivo desta nova abordagem residia na identificação desta ROI de modo a proceder à sua extração a partir da imagem original, aplicando-sedepois os algoritmos de fingerprinting anteriormente abordados. A deteção da ROI foi efetuada com recurso a classificadores em cascata, os quais foram testados com dois tipos de características diferentes: LBP, mais rápidas, mas menos precisas, e Haar, mais complexas, mas também mais fiáveis. As imagens obtidas através da identificação e subsequente recorte foram depois analisadas segundo a presença de grelha, deteção da mesma ou de outros objetos, bem como o grau de perfeição da deteção efetuada. A determinação da ROI a recortar foi também avaliada segundo dois algoritmos: número total de interseções entre ROIs candidatas, e estabelecimento de um limiar de candidatos para uma ROI candidata ser considerada ou rejeitada (apelidado de min-neighbours). As cascatas foram treinadas com recurso a imagens não pertencentes ao dataset de Stanford, de modo a evitar classificações tendenciosas face a imagens previamente apresentadas ao modelo, e para cada tipo de característica foram apresentados dois conjuntos de imagens não correspondentes a grelhas (amostras negativas), que diferiam na sua dimensão e foram consequentemente apelidadas de Nsmall e Nbig. Os melhores resultados foram obtidos com o dataset Nsmall, estabelecimento de limiar, e com recurso a características Haar, sendo a grelha detetada em 81,1% dos casos em que se encontrava efetivamente presente na imagem. Contudo, esta deteção não era completamente a que seria desejável, uma vez que, considerando deteção perfeita e sem elementos externos, a precisão baixava para 32,3%. Deste modo, apesar das variadas vertentes em que esta deteção e extração de ROI foi estudada, foi decidido não avançar para o uso de fingerprinting, devido a constrangimentos de tempo e à baixa precisão que o sistema como um todo conseguiria alcançar. A última técnica a ser testada neste trabalho foi o uso de redes neuronais de convolução (CNN). Para o efeito, e de modo a obter resultados mais fiáveis para o tipo de imagem comumente capturado pelos RSE em contexto de open road tolling, foi usado um novo dataset, consistindo de imagens captadas em contexto real e cedidas por um dos clientes do TCoE. Dentro deste novo conjunto de imagens, foi feita a opção de testar apenas a marca do veículo, com essa classificação a ser feita de forma binária (pertence ou não pertence a determinada marca), ao invés de classificação multi-classe. Para o efeito, foram consideradas as marcas mais prevalentes no conjunto fornecido, Opel e Peugeot. Os primeiros resultados para o uso de CNN revelaram-se promissores, com precisão de 88,9% para a marca Opel e 95,3% para a Peugeot. Todavia, ao serem efetuados testes de validação cruzada para aferir o poder de generalização dos modelos, verificou-se um decréscimo significativo, tanto para Opel (79,3%) como para Peugeot (84,9%), deixando antever a possibilidade de ter ocorrido overfitting na computação dos modelos. Por este motivo, foram efetuados novos ensaios com imagens completamente novas para cada modelo, sendo obtidos resultados de 55,7% para a marca Opel e 57,4% para a marca Peugeot. Assim, embora longe de serem resultados ideais, as CNN aparentam ser a melhor via para um sistema integrado de reconhecimento de veículos, tornando o seu refinamento e estudo numa solução viável para a continuação de um possível trabalho nesta área.For a long time, tolling has served as a way to finance and maintain publicly used roads. In recent years, however, due to generalised vehicle use and consequent traffic demand, there has been a call for open-road tolling solutions, which make use of automatic vehicle identification systems which operate through the use of transponders and automatic license plate recognition. In this context, recognising the make and model of a vehicle (MMR) may prove useful, especially when dealing with infractions. Intelligent automated license plate recognition systems have already been adopted by several Accenture clients, with this new feature being a potential point of interest for future developments. Therefore, the current project aimed to establish a potential means of integrating such a system with the already existing architecture, with requirements being designed to ensure its current reliability and performance would suffer as little an impact as possible. Furthermore, several options were considered as candidates for the future development of an integrated MMR solution, namely, image fingerprinting of a whole image, grille selection followed by localised fingerprinting, and the use of convolutional neural networks (CNN) for image classification. Among these, CNN showed the most promising results, albeit making use of images in limited angle ranges, therefore mimicking those exhibited in captured tolling vehicle images, as well as performing binary classification instead of a multi-class one. Consequently, further work in this area should take these results into account and expand upon them, refining these models and introducing more complexity in the process

    Feature Learning for RGB-D Data

    Get PDF
    RGB-D data has turned out to be a very useful representation for solving fundamental computer vision problems. It takes the advantages of the color images that provide appearance information of an object and also the depth image that is immune to the variations in color, illumination, rotation angle and scale. With the invention of the low-cost Microsoft Kinect sensor, which was initially used for gaming and later became a popular device for computer vision, high quality RGB-D data can be acquired easily. RGB-D image/video can facilitate a wide range of application areas, such as computer vision, robotics, construction and medical imaging. Furthermore, how to fuse RGB information and depth information is still a problem in computer vision. It is not enough to simply concatenate RGB data and depth data together. A new fusion method could better fuse RGB images and depth images. It still needs more powerful algorithms on this. In this thesis, to explore more advantages of RGB-D data, we use some popular RGB-D datasets for deep feature learning algorithms evaluation, hyper-parameter optimization, local multi-modal feature learning, RGB-D data fusion and recognizing RGB information from RGB-D images: i)With the success of Deep Neural Network in computer vision, deep features from fused RGB-D data can be proved to gain better results than RGB data only. However, different deep learning algorithms show different performance on different RGB-D datasets. Through large-scale experiments to comprehensively evaluate the performance of deep feature learning models for RGB-D image/ video classification, we obtain the conclusion that RGB-D fusion methods using CNNs always outperform other selected methods (DBNs, SDAE and LSTM). On the other side, since LSTM can learn from experience to classify, process and predict time series, it achieved better performances than DBN and SDAE in video classification tasks. ii) Hyper-parameter optimization can help researchers quickly choose an initial set of hyper-parameters for a new coming classification task, thus reducing the number of trials in terms of hyper-parameter space. We present a simple and efficient framework for improving the efficiency and accuracy of hyper-parameter optimization by considering the classification complexity of a particular dataset. We verify this framework on three real-world RGB-D datasets. After the analysis of experiments, we confirm that our framework can provide deeper insights into the relationship between dataset classification tasks and hyperparameters optimization, thus quickly choosing an accurate initial set of hyper-parameters for a new coming classification task. iii) We propose a new Convolutional Neural Networks (CNNs)-based local multi-modal feature learning framework for RGB-D scene classification. This method can effectively capture much of the local structure from the RGB-D scene images and automatically learn a fusion strategy for the object-level recognition step instead of simply training a classifier on top of features extracted from both modalities. Experiments are conducted on two popular datasets to thoroughly test the performance of our method, which show that our method with local multi-modal CNNs greatly outperforms state-of-the-art approaches. Our method has the potential to improve RGB-D scene understanding. Some extended evaluation shows that CNNs trained using a scene-centric dataset is able to achieve an improvement on scene benchmarks compared to a network trained using an object-centric dataset. iv) We propose a novel method for RGB-D data fusion. We project raw RGB-D data into a complex space and then jointly extract features from the fused RGB-D images. Besides three observations about the fusion methods, the experimental results also show that our method achieves competing performance against the classical SIFT. v) We propose a novel method called adaptive Visual-Depth Embedding (aVDE) which learns the compact shared latent space between two representations of labeled RGB and depth modalities in the source domain first. Then the shared latent space can help the transfer of the depth information to the unlabeled target dataset. At last, aVDE matches features and reweights instances jointly across the shared latent space and the projected target domain for an adaptive classifier. This method can utilize the additional depth information in the source domain and simultaneously reduce the domain mismatch between the source and target domains. On two real-world image datasets, the experimental results illustrate that the proposed method significantly outperforms the state-of-the-art methods

    Computer Vision for Timber Harvesting

    Get PDF

    Video content analysis for intelligent forensics

    Get PDF
    The networks of surveillance cameras installed in public places and private territories continuously record video data with the aim of detecting and preventing unlawful activities. This enhances the importance of video content analysis applications, either for real time (i.e. analytic) or post-event (i.e. forensic) analysis. In this thesis, the primary focus is on four key aspects of video content analysis, namely; 1. Moving object detection and recognition, 2. Correction of colours in the video frames and recognition of colours of moving objects, 3. Make and model recognition of vehicles and identification of their type, 4. Detection and recognition of text information in outdoor scenes. To address the first issue, a framework is presented in the first part of the thesis that efficiently detects and recognizes moving objects in videos. The framework targets the problem of object detection in the presence of complex background. The object detection part of the framework relies on background modelling technique and a novel post processing step where the contours of the foreground regions (i.e. moving object) are refined by the classification of edge segments as belonging either to the background or to the foreground region. Further, a novel feature descriptor is devised for the classification of moving objects into humans, vehicles and background. The proposed feature descriptor captures the texture information present in the silhouette of foreground objects. To address the second issue, a framework for the correction and recognition of true colours of objects in videos is presented with novel noise reduction, colour enhancement and colour recognition stages. The colour recognition stage makes use of temporal information to reliably recognize the true colours of moving objects in multiple frames. The proposed framework is specifically designed to perform robustly on videos that have poor quality because of surrounding illumination, camera sensor imperfection and artefacts due to high compression. In the third part of the thesis, a framework for vehicle make and model recognition and type identification is presented. As a part of this work, a novel feature representation technique for distinctive representation of vehicle images has emerged. The feature representation technique uses dense feature description and mid-level feature encoding scheme to capture the texture in the frontal view of the vehicles. The proposed method is insensitive to minor in-plane rotation and skew within the image. The capability of the proposed framework can be enhanced to any number of vehicle classes without re-training. Another important contribution of this work is the publication of a comprehensive up to date dataset of vehicle images to support future research in this domain. The problem of text detection and recognition in images is addressed in the last part of the thesis. A novel technique is proposed that exploits the colour information in the image for the identification of text regions. Apart from detection, the colour information is also used to segment characters from the words. The recognition of identified characters is performed using shape features and supervised learning. Finally, a lexicon based alignment procedure is adopted to finalize the recognition of strings present in word images. Extensive experiments have been conducted on benchmark datasets to analyse the performance of proposed algorithms. The results show that the proposed moving object detection and recognition technique superseded well-know baseline techniques. The proposed framework for the correction and recognition of object colours in video frames achieved all the aforementioned goals. The performance analysis of the vehicle make and model recognition framework on multiple datasets has shown the strength and reliability of the technique when used within various scenarios. Finally, the experimental results for the text detection and recognition framework on benchmark datasets have revealed the potential of the proposed scheme for accurate detection and recognition of text in the wild

    Deliverable D1.4 Visual, text and audio information analysis for hypervideo, final release

    Get PDF
    Having extensively evaluated the performance of the technologies included in the first release of WP1 multimedia analysis tools, using content from the LinkedTV scenarios and by participating in international benchmarking activities, concrete decisions regarding the appropriateness and the importance of each individual method or combination of methods were made, which, combined with an updated list of information needs for each scenario, led to a new set of analysis requirements that had to be addressed through the release of the final set of analysis techniques of WP1. To this end, coordinated efforts on three directions, including (a) the improvement of a number of methods in terms of accuracy and time efficiency, (b) the development of new technologies and (c) the definition of synergies between methods for obtaining new types of information via multimodal processing, resulted in the final bunch of multimedia analysis methods for video hyperlinking. Moreover, the different developed analysis modules have been integrated into a web-based infrastructure, allowing the fully automatic linking of the multitude of WP1 technologies and the overall LinkedTV platform

    Effective and efficient visual description based on local binary patterns and gradient distribution for object recognition

    Get PDF
    Cette thèse est consacrée au problème de la reconnaissance visuelle des objets basé sur l'ordinateur, qui est devenue un sujet de recherche très populaire et important ces dernières années grâce à ses nombreuses applications comme l'indexation et la recherche d'image et de vidéo , le contrôle d'accès de sécurité, la surveillance vidéo, etc. Malgré beaucoup d'efforts et de progrès qui ont été fait pendant les dernières années, il reste un problème ouvert et est encore considéré comme l'un des problèmes les plus difficiles dans la communauté de vision par ordinateur, principalement en raison des similarités entre les classes et des variations intra-classe comme occlusion, clutter de fond, les changements de point de vue, pose, l'échelle et l'éclairage. Les approches populaires d'aujourd'hui pour la reconnaissance des objets sont basé sur les descripteurs et les classiffieurs, ce qui généralement extrait des descripteurs visuelles dans les images et les vidéos d'abord, et puis effectue la classification en utilisant des algorithmes d'apprentissage automatique sur la base des caractéristiques extraites. Ainsi, il est important de concevoir une bonne description visuelle, qui devrait être à la fois discriminatoire et efficace à calcul, tout en possédant certaines propriétés de robustesse contre les variations mentionnées précédemment. Dans ce contexte, l objectif de cette thèse est de proposer des contributions novatrices pour la tâche de la reconnaissance visuelle des objets, en particulier de présenter plusieurs nouveaux descripteurs visuelles qui représentent effectivement et efficacement le contenu visuel d image et de vidéo pour la reconnaissance des objets. Les descripteurs proposés ont l'intention de capturer l'information visuelle sous aspects différents. Tout d'abord, nous proposons six caractéristiques LBP couleurs de multi-échelle pour traiter les défauts principaux du LBP original, c'est-à-dire, le déffcit d'information de couleur et la sensibilité aux variations des conditions d'éclairage non-monotoniques. En étendant le LBP original à la forme de multi-échelle dans les différents espaces de couleur, les caractéristiques proposées non seulement ont plus de puissance discriminante par l'obtention de plus d'information locale, mais possèdent également certaines propriétés d'invariance aux différentes variations des conditions d éclairage. En plus, leurs performances sont encore améliorées en appliquant une stratégie de l'image division grossière à fine pour calculer les caractéristiques proposées dans les blocs d'image afin de coder l'information spatiale des structures de texture. Les caractéristiques proposées capturent la distribution mondiale de l information de texture dans les images. Deuxièmement, nous proposons une nouvelle méthode pour réduire la dimensionnalité du LBP appelée la combinaison orthogonale de LBP (OC-LBP). Elle est adoptée pour construire un nouveau descripteur local basé sur la distribution en suivant une manière similaire à SIFT. Notre objectif est de construire un descripteur local plus efficace en remplaçant l'information de gradient coûteux par des patterns de texture locales dans le régime du SIFT. Comme l'extension de notre première contribution, nous étendons également le descripteur OC-LBP aux différents espaces de couleur et proposons six descripteurs OC-LBP couleurs pour améliorer la puissance discriminante et la propriété d'invariance photométrique du descripteur basé sur l'intensité. Les descripteurs proposés capturent la distribution locale de l information de texture dans les images. Troisièmement, nous introduisons DAISY, un nouveau descripteur local rapide basé sur la distribution de gradient, dans le domaine de la reconnaissance visuelle des objets. [...]This thesis is dedicated to the problem of machine-based visual object recognition, which has become a very popular and important research topic in recent years because of its wide range of applications such as image/video indexing and retrieval, security access control, video monitoring, etc. Despite a lot of e orts and progress that have been made during the past years, it remains an open problem and is still considered as one of the most challenging problems in computer vision community, mainly due to inter-class similarities and intra-class variations like occlusion, background clutter, changes in viewpoint, pose, scale and illumination. The popular approaches for object recognition nowadays are feature & classifier based, which typically extract visual features from images/videos at first, and then perform the classification using certain machine learning algorithms based on the extracted features. Thus it is important to design good visual description, which should be both discriminative and computationally efficient, while possessing some properties of robustness against the previously mentioned variations. In this context, the objective of this thesis is to propose some innovative contributions for the task of visual object recognition, in particular to present several new visual features / descriptors which effectively and efficiently represent the visual content of images/videos for object recognition. The proposed features / descriptors intend to capture the visual information from different aspects. Firstly, we propose six multi-scale color local binary pattern (LBP) features to deal with the main shortcomings of the original LBP, namely deficiency of color information and sensitivity to non-monotonic lighting condition changes. By extending the original LBP to multi-scale form in different color spaces, the proposed features not only have more discriminative power by obtaining more local information, but also possess certain invariance properties to different lighting condition changes. In addition, their performances are further improved by applying a coarse-to-fine image division strategy for calculating the proposed features within image blocks in order to encode spatial information of texture structures. The proposed features capture global distribution of texture information in images. Secondly, we propose a new dimensionality reduction method for LBP called the orthogonal combination of local binary patterns (OC-LBP), and adopt it to construct a new distribution-based local descriptor by following a way similar to SIFT.Our goal is to build a more efficient local descriptor by replacing the costly gradient information with local texture patterns in the SIFT scheme. As the extension of our first contribution, we also extend the OC-LBP descriptor to different color spaces and propose six color OC-LBP descriptors to enhance the discriminative power and the photometric invariance property of the intensity-based descriptor. The proposed descriptors capture local distribution of texture information in images. Thirdly, we introduce DAISY, a new fast local descriptor based on gradient distribution, to the domain of visual object recognition.LYON-Ecole Centrale (690812301) / SudocSudocFranceF

    WEATHER LORE VALIDATION TOOL USING FUZZY COGNITIVE MAPS BASED ON COMPUTER VISION

    Get PDF
    Published ThesisThe creation of scientific weather forecasts is troubled by many technological challenges (Stern & Easterling, 1999) while their utilization is generally dismal. Consequently, the majority of small-scale farmers in Africa continue to consult some forms of weather lore to reach various cropping decisions (Baliscan, 2001). Weather lore is a body of informal folklore (Enock, 2013), associated with the prediction of the weather, and based on indigenous knowledge and human observation of the environment. As such, it tends to be more holistic, and more localized to the farmers’ context. However, weather lore has limitations; for instance, it has an inability to offer forecasts beyond a season. Different types of weather lore exist, utilizing almost all available human senses (feel, smell, sight and hearing). Out of all the types of weather lore in existence, it is the visual or observed weather lore that is mostly used by indigenous societies, to come up with weather predictions. On the other hand, meteorologists continue to treat this knowledge as superstition, partly because there is no means to scientifically evaluate and validate it. The visualization and characterization of visual sky objects (such as moon, clouds, stars, and rainbows) in forecasting weather are significant subjects of research. To realize the integration of visual weather lore in modern weather forecasting systems, there is a need to represent and scientifically substantiate this form of knowledge. This research was aimed at developing a method for verifying visual weather lore that is used by traditional communities to predict weather conditions. To realize this verification, fuzzy cognitive mapping was used to model and represent causal relationships between selected visual weather lore concepts and weather conditions. The traditional knowledge used to produce these maps was attained through case studies of two communities (in Kenya and South Africa).These case studies were aimed at understanding the weather lore domain as well as the causal effects between metrological and visual weather lore. In this study, common astronomical weather lore factors related to cloud physics were identified as: bright stars, dispersed clouds, dry weather, dull stars, feathery clouds, gathering clouds, grey clouds, high clouds, layered clouds, low clouds, stars, medium clouds, and rounded clouds. Relationships between the concepts were also identified and formally represented using fuzzy cognitive maps. On implementing the verification tool, machine vision was used to recognize sky objects captured using a sky camera, while pattern recognition was employed in benchmarking and scoring the objects. A wireless weather station was used to capture real-time weather parameters. The visualization tool was then designed and realized in a form of software artefact, which integrated both computer vision and fuzzy cognitive mapping for experimenting visual weather lore, and verification using various statistical forecast skills and metrics. The tool consists of four main sub-components: (1) Machine vision that recognizes sky objects using support vector machine classifiers using shape-based feature descriptors; (2) Pattern recognition–to benchmark and score objects using pixel orientations, Euclidean distance, canny and grey-level concurrence matrix; (3) Fuzzy cognitive mapping that was used to represent knowledge (i.e. active hebbian learning algorithm was used to learn until convergence); and (4) A statistical computing component was used for verifications and forecast skills including brier score and contingency tables for deterministic forecasts. Rigorous evaluation of the verification tool was carried out using independent (not used in the training and testing phases) real-time images from Bloemfontein, South Africa, and Voi-Kenya. The real-time images were captured using a sky camera with GPS location services. The results of the implementation were tested for the selected weather conditions (for example, rain, heat, cold, and dry conditions), and found to be acceptable (the verified prediction accuracies were over 80%). The recommendation in this study is to apply the implemented method for processing tasks, towards verifying all other types of visual weather lore. In addition, the use of the method developed also requires the implementation of modules for processing and verifying other types of weather lore, such as sounds, and symbols of nature. Since time immemorial, from Australia to Asia, Africa to Latin America, local communities have continued to rely on weather lore observations to predict seasonal weather as well as its effects on their livelihoods (Alcock, 2014). This is mainly based on many years of personal experiences in observing weather conditions. However, when it comes to predictions for longer lead-times (i.e. over a season), weather lore is uncertain (Hornidge & Antweiler, 2012). This uncertainty has partly contributed to the current status where meteorologists and other scientists continue to treat weather lore as superstition (United-Nations, 2004), and not capable of predicting weather. One of the problems in testing the confidence in weather lore in predicting weather is due to wide varieties of weather lore that are found in the details of indigenous sayings, which are tightly coupled to locality and pattern variations(Oviedo et al., 2008). This traditional knowledge is entrenched within the day-to-day socio-economic activities of the communities using it and is not globally available for comparison and validation (Huntington, Callaghan, Fox, & Krupnik, 2004). Further, this knowledge is based on local experience that lacks benchmarking techniques; so that harmonizing and integrating it within the science-based weather forecasting systems is a daunting task (Hornidge & Antweiler, 2012). It is partly for this reason that the question of validation of weather lore has not yet been substantially investigated. Sufficient expanded processes of gathering weather observations, combined with comparison and validation, can produce some useful information. Since forecasting weather accurately is a challenge even with the latest supercomputers (BBC News Magazine, 2013), validated weather lore can be useful if it is incorporated into modern weather prediction systems. Validation of traditional knowledge is a necessary step in the management of building integrated knowledge-based systems. Traditional knowledge incorporated into knowledge-based systems has to be verified for enhancing systems’ reliability. Weather lore knowledge exists in different forms as identified by traditional communities; hence it needs to be tied together for comparison and validation. The development of a weather lore validation tool that can integrate a framework for acquiring weather data and methods of representing the weather lore in verifiable forms can be a significant step in the validation of weather lore against actual weather records using conventional weather-observing instruments. The success of validating weather lore could stimulate the opportunity for integrating acceptable weather lore with modern systems of weather prediction to improve actionable information for decision making that relies on seasonal weather prediction. In this study a hybrid method is developed that includes computer vision and fuzzy cognitive mapping techniques for verifying visual weather lore. The verification tool was designed with forecasting based on mimicking visual perception, and fuzzy thinking based on the cognitive knowledge of humans. The method provides meaning to humanly perceivable sky objects so that computers can understand, interpret, and approximate visual weather outcomes. Questionnaires were administered in two case study locations (KwaZulu-Natal province in South Africa, and Taita-Taveta County in Kenya), between the months of March and July 2015. The two case studies were conducted by interviewing respondents on how visual astronomical and meteorological weather concepts cause weather outcomes. The two case studies were used to identify causal effects of visual astronomical and meteorological objects to weather conditions. This was followed by finding variations and comparisons, between the visual weather lore knowledge in the two case studies. The results from the two case studies were aggregated in terms of seasonal knowledge. The causal links between visual weather concepts were investigated using these two case studies; results were compared and aggregated to build up common knowledge. The joint averages of the majority of responses from the case studies were determined for each set of interacting concepts. The modelling of the weather lore verification tool consists of input, processing components and output. The input data to the system are sky image scenes and actual weather observations from wireless weather sensors. The image recognition component performs three sub-tasks, including: detection of objects (concepts) from image scenes, extraction of detected objects, and approximation of the presence of the concepts by comparing extracted objects to ideal objects. The prediction process involves the use of approximated concepts generated in the recognition component to simulate scenarios using the knowledge represented in the fuzzy cognitive maps. The verification component evaluates the variation between the predictions and actual weather observations to determine prediction errors and accuracy. To evaluate the tool, daily system simulations were run to predict and record probabilities of weather outcomes (i.e. rain, heat index/hotness, dry, cold index). Weather observations were captured periodically using a wireless weather station. This process was repeated several times until there was sufficient data to use for the verification process. To match the range of the predicted weather outcomes, the actual weather observations (measurement) were transformed and normalized to a range [0, 1].In the verification process, comparisons were made between the actual observations and weather outcome prediction values by computing residuals (error values) from the observations. The error values and the squared error were used to compute the Mean Squared Error (MSE), and the Root Mean Squared Error (RMSE), for each predicted weather outcome. Finally, the validity of the visual weather lore verification model was assessed using data from a different geographical location. Actual data in the form of daily sky scenes and weather parameters were acquired from Voi, Kenya, from December 2015 to January 2016.The results on the use of hybrid techniques for verification of weather lore is expected to provide an incentive in integrating indigenous knowledge on weather with modern numerical weather prediction systems for accurate and downscaled weather forecasts
    corecore