    Teknologi semakin maju membawa kebermanfaatan dan kabaharuan khususnya di Deep Learning, termasuk Computer Vision. Penerapan ini umumnya pada pembelajaran maupun karyawan di perusahaan, sulitnya mendeteksi ekspresi wajah dalam jumlah besar, lebih dari satu orang dalam kondisi yang sama. Penelitian ini berfokus pada implementasi Algoritma MTCNN dan Arsitektur VGG-16. Penelitian ini menggunakan Framework AI Project Life Cycle, penerapannya dengan framework streamlit dalam pengembangan sistemnya. Dataset yang digunakan FER 2013, terdiri dari 67.885 dataset, 7 jenis ekspresi yaitu angry, disgust, fear, happy, neutral, sad, dan surprise. Pembagian dataset dilakukan dengan membagi menjadi 3 yaitu Train, Validation, dan Testing, data train terdiri dari 42.825 gambar, data validation terdiri dari 10.704 gambar, dan data testing terdiri dari 14.356. Dalam proses training menghasilkan model terbaik dengan training accuracy mencapai 85,70 % dan testing accuracy mencapai 85,71 %, untuk training loss mencapai 1.7759 dan testing loss mencapai 1.7696. ROC AUC yang didapatkan stabil, tidak overfitting dengan ROC AUC score 94%. Sistem deteksi ini memiliki kelemahan jika pencahayaan dan wajah terpotong serta gelap, MTCNN tidak dapat mendeteksinya. ----- More advanced technology brings benefits and updates, especially in Deep Learning, including Computer Vision. This application is generally in learning and employees in the company, it is difficult to detect facial expressions in large numbers, more than one person in the same condition. This research focuses on the implementation of the MTCNN Algorithm and the VGG-16 Architecture. This research uses the AI Project Life Cycle Framework, its application with a streamlit framework in the development of the system. The dataset used by FER 2013, consists of 67,885 datasets, 7 types of expressions, namely angry, disgusted, fearful, happy, neutral, sad, and surprise. Distribution of the dataset is done by dividing into 3 namely Train, Validation, and Testing, train data consists of 42,825 images, data validation consists of 10,704 images, and testing data consists of 14,356. In the training process, it produces the best model with training accuracy reaching 85.70% and testing accuracy reaching 85.71%, for training losses reaching 1.7759 and testing losses reaching 1.7696. The ROC AUC obtained was stable, not overfitting with an ROC AUC score of 94%. This detection system has a weakness if the lighting and faces are cut off and dark, MTCNN cannot detect them

    Multi-Objective Convolutional Neural Networks for Robot Localisation and 3D Position Estimation in 2D Camera Images

    The field of collaborative robotics and human-robot interaction often focuses on the prediction of human behaviour, while assuming the information about the robot setup and configuration being known. This is often the case with fixed setups, which have all the sensors fixed and calibrated in relation to the rest of the system. However, it becomes a limiting factor when the system needs to be reconfigured or moved. We present a deep learning approach, which aims to solve this issue. Our method learns to identify and precisely localise the robot in 2D camera images, so having a fixed setup is no longer a requirement and a camera can be moved. In addition, our approach identifies the robot type and estimates the 3D position of the robot base in the camera image as well as 3D positions of each of the robot joints. Learning is done by using a multi-objective convolutional neural network with four previously mentioned objectives simultaneously using a combined loss function. The multi-objective approach makes the system more flexible and efficient by reusing some of the same features and diversifying for each objective in lower layers. A fully trained system shows promising results in providing an accurate mask of where the robot is located and an estimate of its base and joint positions in 3D. We compare the results to our previous approach of using cascaded convolutional neural networks.Comment: Ubiquitous Robots 2018 Regular paper submissio

    Enhanced Emotion Recognition in Videos: A Convolutional Neural Network Strategy for Human Facial Expression Detection and Classification

    The human face is essential in conveying emotions, as facial expressions serve as effective, natural, and universal indicators of emotional states. Automated emotion recognition has garnered increasing interest due to its potential applications in various fields, such as human-computer interaction, machine learning, robotic control, and driver emotional state monitoring. With artificial intelligence and computational power advancements, visual emotion recognition has become a prominent research area. Despite extensive research employing machine learning algorithms like convolutional neural networks (CNN), challenges remain concerning input data processing, emotion classification scope, data size, optimal CNN configurations, and performance evaluation. To address these issues, we propose a comprehensive CNN-based model for real-time detection and classification of five primary emotions: anger, happiness, neutrality, sadness, and surprise. We employ the Amsterdam Dynamic Facial Expression Set – Bath Intensity Variations (ADFES-BIV) video dataset, extracting image frames from the video samples. Image processing techniques such as histogram equalization, color conversion, cropping, and resizing are applied to the frames before labeling. The Viola-Jones algorithm is then used for face detection on the processed grayscale images. We develop and train a CNN on the processed image data, implementing dropout, batch normalization, and L2 regularization to reduce overfitting. The ideal hyperparameters are determined through trial and error, and the model's performance is evaluated. The proposed model achieves a recognition accuracy of 99.38%, with the confusion matrix, recall, precision, F1 score, and processing time further quantifying its performance characteristics. The model's generalization performance is assessed using images from the Warsaw Set of Emotional Facial Expression Pictures (WSEFEP) and Extended Cohn-Kanade Database (CK+) datasets. The results demonstrate the efficiency and usability of our proposed approach, contributing valuable insights into real-time visual emotion recognition