160 research outputs found

    FINE-GRAINED OBJECT DETECTION

    Get PDF
    Object detection plays a vital role in many real-world computer vision applications such as selfdriving cars, human-less stores and general purpose robotic systems. Convolutional Neural Network(CNN) based Deep Learning has evolved to become the backbone of most computer vision algorithms, including object detection. Most of the research has focused on detecting objects that differ significantly e.g. a car, a person, and a bird. Achieving fine-grained object detection to detect different types within one class of objects from general object detection can be the next step. Fine-grained object detection is crucial to tasks like automated retail checkout. This research has developed deep learning models to detect 200 types of birds of similar size and shape. The models were trained and tested on CUB-200-2011 dataset. To the best of our knowledge, by attaining a mean Average Precision (mAP) of 71.5% we achieved an improvement of 5 percentage points over the previous best mAP of 66.2%

    Enhanced Augmented Reality Framework for Sports Entertainment Applications

    Get PDF
    Augmented Reality (AR) superimposes virtual information on real-world data, such as displaying useful information on videos/images of a scene. This dissertation presents an Enhanced AR (EAR) framework for displaying useful information on images of a sports game. The challenge in such applications is robust object detection and recognition. This is even more challenging when there is strong sunlight. We address the phenomenon where a captured image is degraded by strong sunlight. The developed framework consists of an image enhancement technique to improve the accuracy of subsequent player and face detection. The image enhancement is followed by player detection, face detection, recognition of players, and display of personal information of players. First, an algorithm based on Multi-Scale Retinex (MSR) is proposed for image enhancement. For the tasks of player and face detection, we use adaptive boosting algorithm with Haar-like features for both feature selection and classification. The player face recognition algorithm uses adaptive boosting with the LDA for feature selection and nearest neighbor classifier for classification. The framework can be deployed in any sports where a viewer captures images. Display of players-specific information enhances the end-user experience. Detailed experiments are performed on 2096 diverse images captured using a digital camera and smartphone. The images contain players in different poses, expressions, and illuminations. Player face recognition module requires players faces to be frontal or up to ?350 of pose variation. The work demonstrates the great potential of computer vision based approaches for future development of AR applications.COMSATS Institute of Information Technolog

    A Dual-Modality Emotion Recognition System of EEG and Facial Images and its Application in Educational Scene

    Get PDF
    With the development of computer science, people's interactions with computers or through computers have become more frequent. Some human-computer interactions or human-to-human interactions that are often seen in daily life: online chat, online banking services, facial recognition functions, etc. Only through text messaging, however, can the effect of information transfer be reduced to around 30% of the original. Communication becomes truly efficient when we can see one other's reactions and feel each other's emotions. This issue is especially noticeable in the educational field. Offline teaching is a classic teaching style in which teachers may determine a student's present emotional state based on their expressions and alter teaching methods accordingly. With the advancement of computers and the impact of Covid-19, an increasing number of schools and educational institutions are exploring employing online or video-based instruction. In such circumstances, it is difficult for teachers to get feedback from students. Therefore, an emotion recognition method is proposed in this thesis that can be used for educational scenarios, which can help teachers quantify the emotional state of students in class and be used to guide teachers in exploring or adjusting teaching methods. Text, physiological signals, gestures, facial photographs, and other data types are commonly used for emotion recognition. Data collection for facial images emotion recognition is particularly convenient and fast among them, although there is a problem that people may subjectively conceal true emotions, resulting in inaccurate recognition results. Emotion recognition based on EEG waves can compensate for this drawback. Taking into account the aforementioned issues, this thesis first employs the SVM-PCA to classify emotions in EEG data, then employs the deep-CNN to classify the emotions of the subject's facial images. Finally, the D-S evidence theory is used for fusing and analyzing the two classification results and obtains the final emotion recognition accuracy of 92%. The specific research content of this thesis is as follows: 1) The background of emotion recognition systems used in teaching scenarios is discussed, as well as the use of various single modality systems for emotion recognition. 2) Detailed analysis of EEG emotion recognition based on SVM. The theory of EEG signal generation, frequency band characteristics, and emotional dimensions is introduced. The EEG signal is first filtered and processed with artifact removal. The processed EEG signal is then used for feature extraction using wavelet transforms. It is finally fed into the proposed SVM-PCA for emotion recognition and the accuracy is 64%. 3) Using the proposed deep-CNN to recognize emotions in facial images. Firstly, the Adaboost algorithm is used to detect and intercept the face area in the image, and the gray level balance is performed on the captured image. Then the preprocessed images are trained and tested using the deep-CNN, and the average accuracy is 88%. 4) Fusion method based on decision-making layer. The data fusion at the decision level is carried out with the results of EEG emotion recognition and facial expression emotion recognition. The final dual-modality emotion recognition results and system accuracy of 92% are obtained using D-S evidence theory. 5) The dual-modality emotion recognition system's data collection approach is designed. Based on the process, the actual data in the educational scene is collected and analyzed. The final accuracy of the dual-modality system is 82%. Teachers can use the emotion recognition results as a guide and reference to improve their teaching efficacy

    Pedestrian and Vehicle Detection in Autonomous Vehicle Perception Systems—A Review

    Get PDF
    Autonomous Vehicles (AVs) have the potential to solve many traffic problems, such as accidents, congestion and pollution. However, there are still challenges to overcome, for instance, AVs need to accurately perceive their environment to safely navigate in busy urban scenarios. The aim of this paper is to review recent articles on computer vision techniques that can be used to build an AV perception system. AV perception systems need to accurately detect non-static objects and predict their behaviour, as well as to detect static objects and recognise the information they are providing. This paper, in particular, focuses on the computer vision techniques used to detect pedestrians and vehicles. There have been many papers and reviews on pedestrians and vehicles detection so far. However, most of the past papers only reviewed pedestrian or vehicle detection separately. This review aims to present an overview of the AV systems in general, and then review and investigate several detection computer vision techniques for pedestrians and vehicles. The review concludes that both traditional and Deep Learning (DL) techniques have been used for pedestrian and vehicle detection; however, DL techniques have shown the best results. Although good detection results have been achieved for pedestrians and vehicles, the current algorithms still struggle to detect small, occluded, and truncated objects. In addition, there is limited research on how to improve detection performance in difficult light and weather conditions. Most of the algorithms have been tested on well-recognised datasets such as Caltech and KITTI; however, these datasets have their own limitations. Therefore, this paper recommends that future works should be implemented on more new challenging datasets, such as PIE and BDD100K.EPSRC DTP PhD studentshi

    A Review of Codebook Models in Patch-Based Visual Object Recognition

    No full text
    The codebook model-based approach, while ignoring any structural aspect in vision, nonetheless provides state-of-the-art performances on current datasets. The key role of a visual codebook is to provide a way to map the low-level features into a fixed-length vector in histogram space to which standard classifiers can be directly applied. The discriminative power of such a visual codebook determines the quality of the codebook model, whereas the size of the codebook controls the complexity of the model. Thus, the construction of a codebook is an important step which is usually done by cluster analysis. However, clustering is a process that retains regions of high density in a distribution and it follows that the resulting codebook need not have discriminant properties. This is also recognised as a computational bottleneck of such systems. In our recent work, we proposed a resource-allocating codebook, to constructing a discriminant codebook in a one-pass design procedure that slightly outperforms more traditional approaches at drastically reduced computing times. In this review we survey several approaches that have been proposed over the last decade with their use of feature detectors, descriptors, codebook construction schemes, choice of classifiers in recognising objects, and datasets that were used in evaluating the proposed methods

    Object detection, recognition and classification using computer vision and artificial intelligence approaches

    Get PDF
    Object detection and recognition has been used extensively in recent years to solve numerus challenges in different fields. Due to the vital roles they play, object detection and recognition has enabled quantum leaps in many industry fields by helping to overcome some serious challenges and obstacles. For example, worldwide security concerns have drawn the attention and stimulated the use of highly intelligent computer vision technology to provide security in different environments and in diverse terrains. In addition, some wildlife is at present exposed to danger and extinction worldwide. Therefore, early detection and recognition of potential threats to wildlife have become essential and timely. The extent of using computer vision and artificial intelligence to convert the seemingly insecure world to a more secure one has been widely accepted. Such technologies are used in monitoring, tracking, organising, analysing objects in a scene and for a number of other countless purposes. [Continues.

    Performance Analysis of Different Optimization Algorithms for Multi-Class Object Detection

    Get PDF
    Object recognition is a significant approach employed for recognizing suitable objects from the image. Various improvements, particularly in computer vision, are probable to diagnose highly difficult tasks with the assistance of local feature detection methodologies. Detecting multi-class objects is quite challenging, and many existing researches have worked to enhance the overall accuracy. But because of certain limitations like higher network loss, degraded training ability, improper consideration of features, less convergent and so on. The proposed research introduced a hybrid convolutional neural network (H-CNN) approach to overcome these drawbacks. The collected input images are pre-processed initially through Gaussian filtering to eradicate the noise and enhance the image quality. Followed by image pre-processing, the objects present in the images are localized using Grid Guided Localization (GGL). The effective features are extracted from the localized objects using the AlexNet model. Different objects are classified by replacing the concluding softmax layer of AlexNet with Support Vector Regression (SVR) model. The losses present in the network model are optimized using the Improved Grey Wolf (IGW) optimization procedure. The performances of the proposed model are analyzed using PYTHON. Various datasets are employed, including MIT-67, PASCAL VOC2010, Microsoft (MS)-COCO and MSRC. The performances are analyzed by varying the loss optimization algorithms like improved Particle Swarm Optimization (IPSO), improved Genetic Algorithm (IGA), and improved dragon fly algorithm (IDFA), improved simulated annealing algorithm (ISAA) and improved bacterial foraging algorithm (IBFA), to choose the best algorithm. The proposed accuracy outcomes are attained as PASCAL VOC2010 (95.04%), MIT-67 dataset (96.02%), MSRC (97.37%), and MS COCO (94.53%), respectively

    Object Detection with Active Sample Harvesting

    Get PDF
    The work presented in this dissertation lies in the domains of image classification, object detection, and machine learning. Whether it is training image classifiers or object detectors, the learning phase consists in finding an optimal boundary between populations of samples. In practice, all the samples are not equally important: some examples are trivially classified and do not bring much to the training, while others close to the boundary or misclassified are the ones that truly matter. Similarly, images where the samples originate from are not all rich in informative samples. However, most training procedures select samples and images uniformly or weight them equally. The common thread of this dissertation is how to efficiently find the informative samples/images for training. Although we never consider all the possible samples "in the world", our purpose is to select the samples in a smarter manner, without looking at all the available ones. The framework adopted in this work consists in organising the data (samples or images) in a tree to reflect the statistical regularities of the training samples, by putting "similar" samples in the same branch. Each leaf carries a sample and a weight related to the "importance" of the corresponding sample, and each internal node carries statistics about the weights below. The tree is used to select the next sample/image for training, by applying a sampling policy, and the "importance" weights are updated accordingly, to bias the sampling towards informative samples/images in future iterations. Our experiments show that, in the various applications, properly focusing on informative images or informative samples improves the learning phase by either reaching better performances faster or by reducing the training loss faster
    • …
    corecore