868 research outputs found

    Learning capsules for vehicle logo recognition

    Get PDF
    Vehicle logo recognition is an important part of vehicle identification in intelligent transportation systems. State-of-the-art vehicle logo recognition approaches use automatically learned features from Convolutional Neural Networks (CNNs). However, CNNs do not perform well when images are rotated and very noisy. This paper proposes an image recognition framework with a capsule network. A capsule is a group of neurons, whose length can represent the existence probability of an entity or part of an entity. The orientation of a capsule contains information about the instantiation parameters such as positions and orientations. Capsules are learned by a routing process, which is more effective than the pooling process in CNNs. This paper, for the first time, develops a capsule learning framework in the field of intelligent transportation systems. By testing with the largest publicly available vehicle logo dataset, the proposed framework gives a quick solution and achieves the highest accuracy (100%) on this dataset. The learning capsules have been tested with different image changes such as rotation and occlusion. Image degradations including blurring and noise effects are also considered, and the proposed framework has proven to be superior to CNNs

    Machine Learning Methods for Autonomous Object Recognition and Restoration in Images

    Get PDF
    Image recognition and image restoration are important tasks in the field of image processing. Image recognition are becoming very popular due to the state-of-the-art deep learning methods. However, these models usually require big datasets and high computational costs, which could be challenging. This thesis proposes an online learning framework that deals with both small and big datasets. For small datasets, a Cauchy prior logistic regression classifier is proposed to provide a quick convergence, and the online weight updating scheme is efficient due to the previously trained weights being reused. For big datasets, convolutional neural network could be implemented. For image recognition, non-parametric classifiers are often used for image recognition such as K-nearest neighbours, however, K-nearest neighbours are vulnerable to noise and high dimensional features. This thesis proposes a non-parametric classifier based on Bayesian compressive sensing; the developed classifier is robust and it does not need a training stage. For image restoration, which is usually performed before image recognition as a preprocessing process. This thesis proposes such a joint framework that performs image recognition and restoration simultaneously. In image restoration, image rotation and occlusion are common problems but convolutional neural networks are not suitable to solve these due to the limitation of the convolutional process and pooling process. This thesis develops a joint framework based on capsule networks. The developed joint capsule framework could achieve a good result on recognition, image de-noising, recovering rotation and removing occlusion. The developed algorithms have been evaluated for vehicle logo restoration and recognition, however, they are transferable to other implementations. This thesis also developed an automatic detection and recognition framework for badger monitoring for the first time. Badger plays a key role in the transmission of bovine tuberculosis, which is described by government as the most pressing animal health problem in the UK. An automatic badger monitoring system could help researcher to understand the transmission mechanisms and thereby to develop methods to deal with the transmission between species

    A deep learning framework for joint image restoration and recognition

    Get PDF
    Image restoration and recognition are important computer vision tasks representing an inherent part of autonomous systems. These two tasks are often implemented in a sequential manner, in which the restoration process is followed by a recognition. In contrast, this paper proposes a joint framework that simultaneously performs both tasks within a shared deep neural network architecture. This joint framework integrates the restoration and recognition tasks by incorporating: i) common layers, ii) restoration layers and iii) classification layers. The total loss function combines the restoration and classification losses. The proposed joint framework, based on capsules, provides an efficient solution that can cope with challenges due to noise, image rotations and occlusions. The developed framework has been validated and evaluated on a public vehicle logo dataset under various degradation conditions, including Gaussian noise, rotation and occlusion. The results show that the joint framework improves the accuracy compared with the single task networks

    Learning Attention Mechanisms and Context: An Investigation into Vision and Emotion

    Get PDF
    Attention mechanisms for context modelling are becoming ubiquitous in neural architectures in machine learning. The attention mechanism is a technique that filters out information that is irrelevant to a given task and focuses on learning task-dependent fixation points or regions. Furthermore, attention mechanisms suggest a question about a given task, i.e. `what' to learn and `where/how' to learn for task-specific context modelling. The context is the conditional variables instrumental in deciding the categorical distribution for the given data. Also, why is learning task-specific context necessary? In order to answer these questions, context modelling with attention in the vision and emotion domains is explored in this thesis using attention mechanisms with different hierarchical structures. The three main goals of this thesis are building superior classifiers using attention-based deep neural networks~(DNNs), investigating the role of context modelling in the given tasks, and developing a framework for interpreting hierarchies and attention in deep attention networks. In the vision domain, gesture and posture recognition tasks in diverse environments, are chosen. In emotion, visual and speech emotion recognition tasks are chosen. These tasks are selected for their sequential properties for modelling a spatiotemporal context. One of the key challenges from a machine learning standpoint is to extract patterns which bear maximum correlation with the information encoded in its signal while being as insensitive as possible to other types of information carried by the signal. A possible way to overcome this problem is to learn task-dependent representations. In order to achieve that, novel spatiotemporal context modelling networks and the mixture of multi-view attention~(MOMA) networks are proposed using bidirectional long-short-term memory network (BLSTM), convolutional neural network~(CNN), Capsule and attention networks. A framework has been proposed to interpret the internal attention states with respect to the given task. The results of the classifiers in the assigned tasks are compared with the \textit{state-of-the-art} DNNs, and the proposed classifiers achieve superior results. The context in speech emotion recognition is explored deeply with the attention interpretation framework, and it shows that the proposed model can assign word importance based on acoustic context. Furthermore, it has been observed that the internal states of the attention bear correlation with human perception of acoustic cues for speech emotion recognition. Overall, the results demonstrate superior classifiers and context learning models with interpretable frameworks. The findings are very important for speech emotion recognition systems. In this thesis, not only better models are produced, but also the interpretability of those models are explored, and their internal states are analysed. The phones and words are aligned with the attention vectors, and it is seen that the vowel sounds are more important for defining emotion acoustic cues than the consonants, and the model can assign word importance based on acoustic context. Also, how these approaches for emotion recognition using word importance for predicting emotions are demonstrated by the attention weight visualisation over the words. In a broader perspective, the findings from the thesis about gesture, posture and emotion recognition may be helpful in tasks like human-robot interaction~(HRI) and conversational artificial agents (such as Siri, Alexa). The communication is grounded with the symbolic and sub-symbolic cues of intent either from visual, audio or haptics. The understanding of intent is much dependent on the reasoning about the situational context. Emotion, i.e.\ speech and visual emotion, provides context to a situation, and it is a deciding factor in the response generation. Emotional intelligence and information from vision, audio and other modalities are essential for making human-human and human-robot communication more natural and feedback-driven

    Artificial neural network and its applications in quality process control, document recognition and biomedical imaging

    Get PDF
    In computer-vision based system a digital image obtained by a digital camera would usually have 24-bit color image. The analysis of an image with that many levels might require complicated image processing techniques and higher computational costs. But in real-time application, where a part has to be inspected within a few milliseconds, either we have to reduce the image to a more manageable number of gray levels, usually two levels (binary image), and at the same time retain all necessary features of the original image or develop a complicated technique. A binary image can be obtained by thresholding the original image into two levels. Therefore, thresholding of a given image into binary image is a necessary step for most image analysis and recognition techniques. In this thesis, we have studied the effectiveness of using artificial neural network (ANN) in pharmaceutical, document recognition and biomedical imaging applications for image thresholding and classification purposes. Finally, we have developed edge-based, ANN-based and region-growing based image thresholding techniques to extract low contrast objects of interest and classify them into respective classes in those applications. Real-time quality inspection of gelatin capsules in pharmaceutical applications is an important issue from the point of view of industry\u27s productivity and competitiveness. Computer vision-based automatic quality inspection and controller system is one of the solutions to this problem. Machine vision systems provide quality control and real-time feedback for industrial processes, overcoming physical limitations and subjective judgment of humans. In this thesis, we have developed an image processing system using edge-based image thresholding techniques for quality inspection that satisfy the industrial requirements in pharmaceutical applications to pass the accepted and rejected capsules. In document recognition application, success of OCR mostly depends on the quality of the thresholded image. Non-uniform illumination, low contrast and complex background make it challenging in this application. In this thesis, optimal parameters for ANN-based local thresholding approach for gray scale composite document image with non-uniform background is proposed. An exhaustive search was conducted to select the optimal features and found that pixel value, mean and entropy are the most significant features at window size 3x3 in this application. For other applications, it might be different, but the procedure to find the optimal parameters is same. The average recognition rate 99.25% shows that the proposed 3 features at window size 3x3 are optimal in terms of recognition rate and PSNR compare to the ANN-based thresholding technique with different parameters presented in the literature. In biomedical imaging application, breast cancer continues to be a public health problem. In this thesis we presented a computer aided diagnosis (CAD) system for mass detection and classification in digitized mammograms, which performs mass detection on regions of interest (ROI) followed by the benign-malignant classification on detected masses. Three layers ANN with seven features is proposed for classifying the marked regions into benign and malignant and 90.91% sensitivity and 83.87% specificity is achieved that is very much promising compare to the radiologist\u27s sensitivity 75%

    American sign language posture understanding with deep neural networks

    Get PDF
    Sign language is a visually oriented, natural, nonverbal communication medium. Having shared similar linguistic properties with its respective spoken language, it consists of a set of gestures, postures and facial expressions. Though, sign language is a mode of communication between deaf people, most other people do not know sign language interpretations. Therefore, it would be constructive if we can translate the sign postures artificially. In this paper, a capsule-based deep neural network sign posture translator for an American Sign Language (ASL) fingerspelling (posture), has been presented. The performance validation shows that the approach can successfully identify sign language, with accuracy like 99%. Unlike previous neural network approaches, which mainly used fine-tuning and transfer learning from pre-trained models, the developed capsule network architecture does not require a pre-trained model. The framework uses a capsule network with adaptive pooling which is the key to its high accuracy. The framework is not limited to sign language understanding, but it has scope for non-verbal communication in Human-Robot Interaction (HRI) also

    July 30, 1990 Open Air

    Get PDF
    Shawnee State University Student Newspaperhttps://digitalcommons.shawnee.edu/open_air/1128/thumbnail.jp

    The Djelk Ranger Program: an outsider’s perspective

    No full text
    This report is the result of a ten-day general conceptualisation research trip in May 2003 into an Indigenous community to study the Djelk Ranger program operating under the Bawinanga Aboriginal Corporation (BAC). During this visit I spent time with several different groups of Rangers and visited several sustainable wildlife harvesting sites which are described here. The Djelk Ranger program established by the BAC is built on the extensive knowledge and skills that already exist within this Indigenous community. The success of the ventures mentioned in this report is built on a unique blend of formal legal institutional mechanisms and customary law and socio-cultural conventions. Cooperative community-based wildlife resource management and aquaculture has the potential to deliver sustainable and cost effective development benefits for Indigenous landowners. Greater recognition of the valuable land management and biodiversity conservation roles undertaken by Indigenous people in these circumstances would seem appropriate, and it would be desirable for these roles to be reflected in more formal and sustained income arrangements than the current CDEP project funding. The opportunities for economic development in Indigenous communities, and some of the challenges that these communities face are demonstrated in the Djelk Ranger program initiative. The BAC is an impressive institution for its commitment to learning, communication, cultural integration, and economic development. There is clearly a need for such adaptive and flexible institutions to provide a bridge between cultures and protect the interests of remote Indigenous communities

    August 13, 1990 Open Air

    Get PDF
    Shawnee State University Student Newspaperhttps://digitalcommons.shawnee.edu/open_air/1129/thumbnail.jp
    • …
    corecore