350 research outputs found
Reconhecimento de padrões em expressões faciais : algoritmos e aplicações
Orientador: HĂ©lio PedriniTese (doutorado) - Universidade Estadual de Campinas, Instituto de ComputaçãoResumo: O reconhecimento de emoções tem-se tornado um tĂłpico relevante de pesquisa pela comunidade cientĂfica, uma vez que desempenha um papel essencial na melhoria contĂnua dos sistemas de interação humano-computador. Ele pode ser aplicado em diversas áreas, tais como medicina, entretenimento, vigilância, biometria, educação, redes sociais e computação afetiva. Há alguns desafios em aberto relacionados ao desenvolvimento de sistemas emocionais baseados em expressões faciais, como dados que refletem emoções mais espontâneas e cenários reais. Nesta tese de doutorado, apresentamos diferentes metodologias para o desenvolvimento de sistemas de reconhecimento de emoções baseado em expressões faciais, bem como sua aplicabilidade na resolução de outros problemas semelhantes. A primeira metodologia Ă© apresentada para o reconhecimento de emoções em expressões faciais ocluĂdas baseada no Histograma da Transformada Census (CENTRIST). Expressões faciais ocluĂdas sĂŁo reconstruĂdas usando a Análise Robusta de Componentes Principais (RPCA). A extração de caracterĂsticas das expressões faciais Ă© realizada pelo CENTRIST, bem como pelos Padrões Binários Locais (LBP), pela Codificação Local do Gradiente (LGC) e por uma extensĂŁo do LGC. O espaço de caracterĂsticas gerado Ă© reduzido aplicando-se a Análise de Componentes Principais (PCA) e a Análise Discriminante Linear (LDA). Os algoritmos K-Vizinhos mais PrĂłximos (KNN) e Máquinas de Vetores de Suporte (SVM) sĂŁo usados para classificação. O mĂ©todo alcançou taxas de acerto competitivas para expressões faciais ocluĂdas e nĂŁo ocluĂdas. A segunda Ă© proposta para o reconhecimento dinâmico de expressões faciais baseado em Ritmos Visuais (VR) e Imagens da HistĂłria do Movimento (MHI), de modo que uma fusĂŁo de ambos descritores codifique informações de aparĂŞncia, forma e movimento dos vĂdeos. Para extração das caracterĂsticas, o Descritor Local de Weber (WLD), o CENTRIST, o Histograma de Gradientes Orientados (HOG) e a Matriz de CoocorrĂŞncia em NĂvel de Cinza (GLCM) sĂŁo empregados. A abordagem apresenta uma nova proposta para o reconhecimento dinâmico de expressões faciais e uma análise da relevância das partes faciais. A terceira Ă© um mĂ©todo eficaz apresentado para o reconhecimento de emoções audiovisuais com base na fala e nas expressões faciais. A metodologia envolve uma rede neural hĂbrida para extrair caracterĂsticas visuais e de áudio dos vĂdeos. Para extração de áudio, uma Rede Neural Convolucional (CNN) baseada no log-espectrograma de Mel Ă© usada, enquanto uma CNN construĂda sobre a Transformada de Census Ă© empregada para a extração das caracterĂsticas visuais. Os atributos audiovisuais sĂŁo reduzidos por PCA e LDA, entĂŁo classificados por KNN, SVM, RegressĂŁo LogĂstica (LR) e Gaussian NaĂŻve Bayes (GNB). A abordagem obteve taxas de reconhecimento competitivas, especialmente em dados espontâneos. A penĂşltima investiga o problema de detectar a sĂndrome de Down a partir de fotografias. Um descritor geomĂ©trico Ă© proposto para extrair caracterĂsticas faciais. Experimentos realizados em uma base de dados pĂşblica mostram a eficácia da metodologia desenvolvida. A Ăşltima metodologia trata do reconhecimento de sĂndromes genĂ©ticas em fotografias. O mĂ©todo visa extrair atributos faciais usando caracterĂsticas de uma rede neural profunda e medidas antropomĂ©tricas. Experimentos sĂŁo realizados em uma base de dados pĂşblica, alcançando taxas de reconhecimento competitivasAbstract: Emotion recognition has become a relevant research topic by the scientific community, since it plays an essential role in the continuous improvement of human-computer interaction systems. It can be applied in various areas, for instance, medicine, entertainment, surveillance, biometrics, education, social networks, and affective computing. There are some open challenges related to the development of emotion systems based on facial expressions, such as data that reflect more spontaneous emotions and real scenarios. In this doctoral dissertation, we propose different methodologies to the development of emotion recognition systems based on facial expressions, as well as their applicability in the development of other similar problems. The first is an emotion recognition methodology for occluded facial expressions based on the Census Transform Histogram (CENTRIST). Occluded facial expressions are reconstructed using an algorithm based on Robust Principal Component Analysis (RPCA). Extraction of facial expression features is then performed by CENTRIST, as well as Local Binary Patterns (LBP), Local Gradient Coding (LGC), and an LGC extension. The generated feature space is reduced by applying Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) algorithms are used for classification. This method reached competitive accuracy rates for occluded and non-occluded facial expressions. The second proposes a dynamic facial expression recognition based on Visual Rhythms (VR) and Motion History Images (MHI), such that a fusion of both encodes appearance, shape, and motion information of the video sequences. For feature extraction, Weber Local Descriptor (WLD), CENTRIST, Histogram of Oriented Gradients (HOG), and Gray-Level Co-occurrence Matrix (GLCM) are employed. This approach shows a new direction for performing dynamic facial expression recognition, and an analysis of the relevance of facial parts. The third is an effective method for audio-visual emotion recognition based on speech and facial expressions. The methodology involves a hybrid neural network to extract audio and visual features from videos. For audio extraction, a Convolutional Neural Network (CNN) based on log Mel-spectrogram is used, whereas a CNN built on Census Transform is employed for visual extraction. The audio and visual features are reduced by PCA and LDA, and classified through KNN, SVM, Logistic Regression (LR), and Gaussian NaĂŻve Bayes (GNB). This approach achieves competitive recognition rates, especially in a spontaneous data set. The second last investigates the problem of detecting Down syndrome from photographs. A geometric descriptor is proposed to extract facial features. Experiments performed on a public data set show the effectiveness of the developed methodology. The last methodology is about recognizing genetic disorders in photos. This method focuses on extracting facial features using deep features and anthropometric measurements. Experiments are conducted on a public data set, achieving competitive recognition ratesDoutoradoCiĂŞncia da ComputaçãoDoutora em CiĂŞncia da Computação140532/2019-6CNPQCAPE
A review of arthritis diagnosis techniques in artificial intelligence era: Current trends and research challenges
Deep learning, a branch of artificial intelligence, has achieved unprecedented performance in several domains including medicine to assist with efficient diagnosis of diseases, prediction of disease progression and pre-screening step for physicians. Due to its significant breakthroughs, deep learning is now being used for the diagnosis of arthritis, which is a chronic disease affecting young to aged population. This paper provides a survey of recent and the most representative deep learning techniques (published between 2018 to 2020) for the diagnosis of osteoarthritis and rheumatoid arthritis. The paper also reviews traditional machine learning methods (published 2015 onward) and their application for the diagnosis of these diseases. The paper identifies open problems and research gaps. We believe that deep learning can assist general practitioners and consultants to predict the course of the disease, make treatment propositions and appraise their potential benefits
Application of Computer Vision and Mobile Systems in Education: A Systematic Review
The computer vision industry has experienced a significant surge in growth, resulting in numerous promising breakthroughs in computer intelligence. The present review paper outlines the advantages and potential future implications of utilizing this technology in education. A total of 84 research publications have been thoroughly scrutinized and analyzed. The study revealed that computer vision technology integrated with a mobile application is exceptionally useful in monitoring students’ perceptions and mitigating academic dishonesty. Additionally, it facilitates the digitization of handwritten scripts for plagiarism detection and automates attendance tracking to optimize valuable classroom time. Furthermore, several potential applications of computer vision technology for educational institutions have been proposed to enhance students’ learning processes in various faculties, such as engineering, medical science, and others. Moreover, the technology can also aid in creating a safer campus environment by automatically detecting abnormal activities such as ragging, bullying, and harassment
Segmentation of pelvic structures from preoperative images for surgical planning and guidance
Prostate cancer is one of the most frequently diagnosed malignancies globally and the second leading cause of cancer-related mortality in males in the developed world. In recent decades, many techniques have been proposed for prostate cancer diagnosis and treatment. With the development of imaging technologies such as CT and MRI, image-guided procedures have become increasingly important as a means to improve clinical outcomes. Analysis of the preoperative images and construction of 3D models prior to treatment would help doctors to better localize and visualize the structures of interest, plan the procedure, diagnose disease and guide the surgery or therapy. This requires efficient and robust medical image analysis and segmentation technologies to be developed.
The thesis mainly focuses on the development of segmentation techniques in pelvic MRI for image-guided robotic-assisted laparoscopic radical prostatectomy and external-beam radiation therapy. A fully automated multi-atlas framework is proposed for bony pelvis segmentation in MRI, using the guidance of MRI AE-SDM. With the guidance of the AE-SDM, a multi-atlas segmentation algorithm is used to delineate the bony pelvis in a new \ac{MRI} where there is no CT available. The proposed technique outperforms state-of-the-art algorithms for MRI bony pelvis segmentation. With the SDM of pelvis and its segmented surface, an accurate 3D pelvimetry system is designed and implemented to measure a comprehensive set of pelvic geometric parameters for the examination of the relationship between these parameters and the difficulty of robotic-assisted laparoscopic radical prostatectomy. This system can be used in both manual and automated manner with a user-friendly interface.
A fully automated and robust multi-atlas based segmentation has also been developed to delineate the prostate in diagnostic MR scans, which have large variation in both intensity and shape of prostate. Two image analysis techniques are proposed, including patch-based label fusion with local appearance-specific atlases and multi-atlas propagation via a manifold graph on a database of both labeled and unlabeled images when limited labeled atlases are available. The proposed techniques can achieve more robust and accurate segmentation results than other multi-atlas based methods.
The seminal vesicles are also an interesting structure for therapy planning, particularly for external-beam radiation therapy. As existing methods fail for the very onerous task of segmenting the seminal vesicles, a multi-atlas learning framework via random decision forests with graph cuts refinement has further been proposed to solve this difficult problem. Motivated by the performance of this technique, I further extend the multi-atlas learning to segment the prostate fully automatically using multispectral (T1 and T2-weighted) MR images via hybrid \ac{RF} classifiers and a multi-image graph cuts technique. The proposed method compares favorably to the previously proposed multi-atlas based prostate segmentation.
The work in this thesis covers different techniques for pelvic image segmentation in MRI. These techniques have been continually developed and refined, and their application to different specific problems shows ever more promising results.Open Acces
Camera-Based Heart Rate Extraction in Noisy Environments
Remote photoplethysmography (rPPG) is a non-invasive technique that benefits from video to measure vital signs such as the heart rate (HR). In rPPG estimation, noise can introduce artifacts that distort rPPG signal and jeopardize accurate HR measurement. Considering that most rPPG studies occurred in lab-controlled environments, the issue of noise in realistic conditions remains open.
This thesis aims to examine the challenges of noise in rPPG estimation in realistic scenarios, specifically investigating the effect of noise arising from illumination variation and motion artifacts on the predicted rPPG HR. To mitigate the impact of noise, a modular rPPG measurement framework, comprising data preprocessing, region of interest, signal extraction, preparation, processing, and HR extraction is developed. The proposed pipeline is tested on the LGI-PPGI-Face-Video-Database public dataset, hosting four different candidates and real-life scenarios. In the RoI module, raw rPPG signals were extracted from the dataset using three machine learning-based face detectors, namely Haarcascade, Dlib, and MediaPipe, in parallel. Subsequently, the collected signals underwent preprocessing, independent component analysis, denoising, and frequency domain conversion for peak detection.
Overall, the Dlib face detector leads to the most successful HR for the majority of scenarios. In 50% of all scenarios and candidates, the average predicted HR for Dlib is either in line or very close to the average reference HR. The extracted HRs from the Haarcascade and MediaPipe architectures make up 31.25% and 18.75% of plausible results, respectively. The analysis highlighted the importance of fixated facial landmarks in collecting quality raw data and reducing noise
Automated axial right ventricle to left ventricle diameter ratio computation in computed tomography pulmonary angiography
Automated medical image analysis requires methods to
localize anatomic structures in the presence of normal interpatient variability, pathology, and the different protocols used to acquire images for different clinical settings. Recent advances have improved object detection in the context of natural images, but they have not been adapted to the 3D context of medical images. In this paper we present a 2.5D object detector designed to locate, without any user interaction, the left and right heart ventricles in Computed Tomography Pulmonary Angiography (CTPA) images. A 2D object detector is trained to find ventricles on axial slices. Those detections are automatically clustered according to
their size and position. The cluster with highest score,
representing the 3D location of the ventricle, is then selected. The proposed method is validated in 403 CTPA studies obtained in patients with clinically suspected pulmonary embolism. Both ventricles are properly detected in 94.7% of the cases. The proposed method is very generic and can be easily adapted to detect other structures in medical images
Object Detection in 20 Years: A Survey
Object detection, as of one the most fundamental and challenging problems in
computer vision, has received great attention in recent years. Its development
in the past two decades can be regarded as an epitome of computer vision
history. If we think of today's object detection as a technical aesthetics
under the power of deep learning, then turning back the clock 20 years we would
witness the wisdom of cold weapon era. This paper extensively reviews 400+
papers of object detection in the light of its technical evolution, spanning
over a quarter-century's time (from the 1990s to 2019). A number of topics have
been covered in this paper, including the milestone detectors in history,
detection datasets, metrics, fundamental building blocks of the detection
system, speed up techniques, and the recent state of the art detection methods.
This paper also reviews some important detection applications, such as
pedestrian detection, face detection, text detection, etc, and makes an in-deep
analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible
publicatio
- …