52 research outputs found

    Feature fusion H-ELM based learned features and hand-crafted features for human activity recognition

    Get PDF
    Recognizing human activities is one of the main goals of human-centered intelligent systems. Smartphone sensors produce a continuous sequence of observations. These observations are noisy, unstructured and high dimensional. Therefore, efficient features have to be extracted in order to perform accurate classification. This paper proposes a combination of Hierarchical and kernel Extreme Learning Machine (HK-ELM) methods to learn features and map them to specific classes in a short time. Moreover, a feature fusion approach is proposed to combine H-ELM based learned features with hand-crafted ones. Our proposed method was found to outperform state-of-the-art in terms of accuracy and training time. It gives accuracy of 97.62 % and takes 3.4 seconds as a training time by using a normal Central Processing Unit (CPU)

    Utilizing hierarchical extreme learning machine based reinforcement learning for object sorting

    Get PDF
    Automatic and intelligent object sorting is an important task that can sort different objects without human intervention, using the robot arm to carry each object from one location to another. These objects vary in colours, shapes, sizes and orientations. Many applications, such as fruit and vegetable grading, flower grading, and biopsy image grading depend on sorting for a structural arrangement. Traditional machine learning methods, with extracting handcrafted features, are used for this task. Sometimes, these features are not discriminative because of the environmental factors, such as light change. In this study, Hierarchical Extreme Learning Machine (HELM) is utilized as an unsupervised feature learning to learn the object observation directly, and HELM was found to be robust against external change. Reinforcement learning (RL) is used to find the optimal sorting policy that maps each object image to the object’s location. The reason for utilizing RL is lack of output labels in this automatic task. The learning is done sequentially in many episodes. At each episode, the accuracy of sorting is increased to reach the maximum level at the end of learning. The experimental results demonstrated that the proposed HELM-RL sorting can provide the same accuracy as the labelled supervised HELM method after many episodes

    Exploring the Potential of Generative AI for the World Wide Web

    Full text link
    Generative Artificial Intelligence (AI) is a cutting-edge technology capable of producing text, images, and various media content leveraging generative models and user prompts. Between 2022 and 2023, generative AI surged in popularity with a plethora of applications spanning from AI-powered movies to chatbots. In this paper, we delve into the potential of generative AI within the realm of the World Wide Web, specifically focusing on image generation. Web developers already harness generative AI to help crafting text and images, while Web browsers might use it in the future to locally generate images for tasks like repairing broken webpages, conserving bandwidth, and enhancing privacy. To explore this research area, we have developed WebDiffusion, a tool that allows to simulate a Web powered by stable diffusion, a popular text-to-image model, from both a client and server perspective. WebDiffusion further supports crowdsourcing of user opinions, which we use to evaluate the quality and accuracy of 409 AI-generated images sourced from 60 webpages. Our findings suggest that generative AI is already capable of producing pertinent and high-quality Web images, even without requiring Web designers to manually input prompts, just by leveraging contextual information available within the webpages. However, we acknowledge that direct in-browser image generation remains a challenge, as only highly powerful GPUs, such as the A40 and A100, can (partially) compete with classic image downloads. Nevertheless, this approach could be valuable for a subset of the images, for example when fixing broken webpages or handling highly private content.Comment: 11 pages, 9 figure

    Benchmarking different deep regression models for predicting image rotation angle and robot’s end effector’s position

    Get PDF
    Deep visual regression models have an important role to find how much the learning model fits the relationship between the visual data (images) and the predicted continuous output. Recently, deep visual regression has been utilized in different applications such as age prediction, digital holography, and head-pose estimation. Deep learning has recently been cutting-edge research. Most of the research papers have focused on utilizing deep learning in classification tasks. There is still a lack of research that use deep learning for regression. This paper utilizes different deep learning models for two regression tasks. The first one is the prediction of the image rotation angle. The second task is to predict the position of the robot’s end-effector in 2D space. Efficient features were learned or extracted in order to perform good regression. The paper demonstrates and compares various models such as a local Receptive Field-Extreme Learning Machine (LRF-ELM), Hierarchical ELM, Supervised Convolutional Neural Network (CNN), and pre-trained CNN such as AlexNet. Each model was trained to learn or extract features and map them to specific continuous output. The results show that all models gave good performance in terms of RMSE and accuracy. H-ELM was found to outperform other models in term of training speed

    Localization and Classification of Parasitic Eggs in Microscopic Images Using an EfficientDet Detector

    Full text link
    IPIs caused by protozoan and helminth parasites are among the most common infections in humans in LMICs. They are regarded as a severe public health concern, as they cause a wide array of potentially detrimental health conditions. Researchers have been developing pattern recognition techniques for the automatic identification of parasite eggs in microscopic images. Existing solutions still need improvements to reduce diagnostic errors and generate fast, efficient, and accurate results. Our paper addresses this and proposes a multi-modal learning detector to localize parasitic eggs and categorize them into 11 categories. The experiments were conducted on the novel Chula-ParasiteEgg-11 dataset that was used to train both EfficientDet model with EfficientNet-v2 backbone and EfficientNet-B7+SVM. The dataset has 11,000 microscopic training images from 11 categories. Our results show robust performance with an accuracy of 92%, and an F1 score of 93%. Additionally, the IOU distribution illustrates the high localization capability of the detector.Comment: 6 pages, 7 figures, to be published in IEEE International Conference on Image Processing 202

    Real-Time Human Detection for Aerial Captured Video Sequences via Deep Models

    Get PDF
    Human detection in videos plays an important role in various real life applications. Most of traditional approaches depend on utilizing handcrafted features which are problem-dependent and optimal for specific tasks. Moreover, they are highly susceptible to dynamical events such as illumination changes, camera jitter, and variations in object sizes. On the other hand, the proposed feature learning approaches are cheaper and easier because highly abstract and discriminative features can be produced automatically without the need of expert knowledge. In this paper, we utilize automatic feature learning methods which combine optical flow and three different deep models (i.e., supervised convolutional neural network (S-CNN), pretrained CNN feature extractor, and hierarchical extreme learning machine) for human detection in videos captured using a nonstatic camera on an aerial platform with varying altitudes. The models are trained and tested on the publicly available and highly challenging UCF-ARG aerial dataset. The comparison between these models in terms of training, testing accuracy, and learning speed is analyzed. The performance evaluation considers five human actions (digging, waving, throwing, walking, and running). Experimental results demonstrated that the proposed methods are successful for human detection task. Pretrained CNN produces an average accuracy of 98.09%. S-CNN produces an average accuracy of 95.6% with soft-max and 91.7% with Support Vector Machines (SVM). H-ELM has an average accuracy of 95.9%. Using a normal Central Processing Unit (CPU), H-ELM’s training time takes 445 seconds. Learning in S-CNN takes 770 seconds with a high performance Graphical Processing Unit (GPU)

    Transfer Detection of YOLO to Focus CNN’s Attention on Nude Regions for Adult Content Detection

    No full text
    Video pornography and nudity detection aim to detect and classify people in videos into nude or normal for censorship purposes. Recent literature has demonstrated pornography detection utilising the convolutional neural network (CNN) to extract features directly from the whole frames and support vector machine (SVM) to classify the extracted features into two categories. However, existing methods were not able to detect the small-scale content of pornography and nudity in frames with diverse backgrounds. This limitation has led to a high false-negative rate (FNR) and misclassification of nude frames as normal ones. In order to address this matter, this paper explores the limitation of the existing convolutional-only approaches focusing the visual attention of CNN on the expected nude regions inside the frames to reduce the FNR. The You Only Look Once (YOLO) object detector was transferred to the pornography and nudity detection application to detect persons as regions of interest (ROIs), which were applied to CNN and SVM for nude/normal classification. Several experiments were conducted to compare the performance of various CNNs and classifiers using our proposed dataset. It was found that ResNet101 with random forest outperformed other models concerning the F1-score of 90.03% and accuracy of 87.75%. Furthermore, an ablation study was performed to demonstrate the impact of adding the YOLO before the CNN. YOLO–CNN was shown to outperform CNN-only in terms of accuracy, which was increased from 85.5% to 89.5%. Additionally, a new benchmark dataset with challenging content, including various human sizes and backgrounds, was proposed

    HELM based Reinforcement Learning for Goal Localization

    No full text
    The objective of goal localizationHELM based reinforcement learning for goal localization is to find the location of goals in noisy environments. Simple actions are performed to move the agent towards the goal. The goal detector should be capable of minimizing the error between the predicted locations and the true ones. Few regions are processed by the agent to reduce the computational effort and increase the speed of convergence. In this paper, reinforcement learning method was utilized to find optimal series of actions to localize the goal region. The visual data, a set of images, is high dimensional unstructured data and needs to be represented efficiently to get a robust detector. Hierarchical Extreme Learning Machine (H-ELM) algorithm was used to find good features for effective representation. The results were analysed by using Matlab progra

    Hierarchical extreme learning machine based reinforcement learning for goal localization

    No full text
    The objective of goal localization is to find the location of goals in noisy environments. Simple actions are performed to move the agent towards the goal. The goal detector should be capable of minimizing the error between the predicted locations and the true ones. Few regions need to be processed by the agent to reduce the computational effort and increase the speed of convergence. In this paper, reinforcement learning (RL) method was utilized to find optimal series of actions to localize the goal region. The visual data, a set of images, is high dimensional unstructured data and needs to be represented efficiently to get a robust detector. Different deep Reinforcement models have already been used to localize a goal but most of them take long time to learn the model. This long learning time results from the weights fine tuning stage that is applied iteratively to find an accurate model. Hierarchical Extreme Learning Machine (H-ELM) was used as a fast deep model that doesn’t fine tune the weights. In other words, hidden weights are generated randomly and output weights are calculated analytically. H-ELM algorithm was used in this work to find good features for effective representation. This paper proposes a combination of Hierarchical Extreme learning machine and Reinforcement learning to find an optimal policy directly from visual input. This combination outperforms other methods in terms of accuracy and learning speed. The simulations and results were analysed by using MATLAB

    RGB-D Based Multimodal Convolutional Neural Networks for Spacecraft Recognition

    No full text
    Spacecraft recognition is a significant component of space situational awareness (SSA), especially for applications such as active debris removal, on-orbit servicing, and satellite formation. The complexity of recognition in actual space imagery is caused by a large diversity in sensing conditions, including background noise, low signal-to-noise ratio, different orbital scenarios, and high contrast. This paper addresses the previous problem and proposes multimodal convolutional neural networks (CNNs) for spacecraft detection and classification. The proposed solution includes two models: 1) a pre-trained ResNet50 CNN connected to a support vector machine (SVM) classifier for classification of RGB images. 2) an end-to-end CNN for classification of depth images. The experiments conducted on a novel SPARK dataset was generated under a realistic space simulation environment and has 150k of RGB images and 150k of depth images with 11 categories. The results show high performance of the proposed solution in terms of accuracy (89 %), F1 score (87 %), and Perf metric (1.8)
    corecore