293 research outputs found

    Automatic object classification for surveillance videos.

    Get PDF
    PhDThe recent popularity of surveillance video systems, specially located in urban scenarios, demands the development of visual techniques for monitoring purposes. A primary step towards intelligent surveillance video systems consists on automatic object classification, which still remains an open research problem and the keystone for the development of more specific applications. Typically, object representation is based on the inherent visual features. However, psychological studies have demonstrated that human beings can routinely categorise objects according to their behaviour. The existing gap in the understanding between the features automatically extracted by a computer, such as appearance-based features, and the concepts unconsciously perceived by human beings but unattainable for machines, or the behaviour features, is most commonly known as semantic gap. Consequently, this thesis proposes to narrow the semantic gap and bring together machine and human understanding towards object classification. Thus, a Surveillance Media Management is proposed to automatically detect and classify objects by analysing the physical properties inherent in their appearance (machine understanding) and the behaviour patterns which require a higher level of understanding (human understanding). Finally, a probabilistic multimodal fusion algorithm bridges the gap performing an automatic classification considering both machine and human understanding. The performance of the proposed Surveillance Media Management framework has been thoroughly evaluated on outdoor surveillance datasets. The experiments conducted demonstrated that the combination of machine and human understanding substantially enhanced the object classification performance. Finally, the inclusion of human reasoning and understanding provides the essential information to bridge the semantic gap towards smart surveillance video systems

    Using facial expression recognition for crowd monitoring.

    Get PDF
    Master of Science in Engineering. University of KwaZulu-Natal, Durban 2017.In recent years, Crowd Monitoring techniques have attracted emerging interest in the eld of computer vision due to their ability to monitor groups of people in crowded areas, where conventional image processing methods would not suffice. Existing Crowd Monitoring techniques focus heavily on analyzing a crowd as a single entity, usually in terms of their density and movement pattern. While these techniques are well suited for the task of identifying dangerous and emergency situations, such as a large group of people exiting a building at once, they are very limited when it comes to identifying emotion within a crowd. By isolating different types of emotion within a crowd, we aim to predict the mood of a crowd even in scenes of non-panic. In this work, we propose a novel Crowd Monitoring system based on estimating crowd emotion using Facial Expression Recognition (FER). In the past decade, both FER and activity recognition have been proposed for human emotion detection. However, facial expression is arguably more descriptive when identifying emotion and is less likely to be obscured in crowded environments compared to body pos- ture. Given a crowd image, the popular Viola and Jones face detection algorithm is used to detect and extract unobscured faces from individuals in the crowd. A ro- bust and efficient appearance based method of FER, such as Gradient Local Ternary Pattern (GLTP), is used together with a machine learning algorithm, Support Vec- tor Machine (SVM), to extract and classify each facial expression as one of seven universally accepted emotions (joy, surprise, anger, fear, disgust, sadness or neutral emotion). Crowd emotion is estimated by isolating groups of similar emotion based on their relative size and weighting. To validate the effectiveness of the proposed system, a series of cross-validation tests are performed using a novel Crowd Emotion dataset with known ground-truth emotions. The results show that the system presented is able to accurately and efficiently predict multiple classes of crowd emotion even in non-panic situations where movement and density information may be incomplete. In the future, this type of system can be used for many security applications; such as helping to alert authorities to potentially aggressive crowds of people in real-time

    Recording behaviour of indoor-housed farm animals automatically using machine vision technology: a systematic review

    Get PDF
    Large-scale phenotyping of animal behaviour traits is time consuming and has led to increased demand for technologies that can automate these procedures. Automated tracking of animals has been successful in controlled laboratory settings, but recording from animals in large groups in highly variable farm settings presents challenges. The aim of this review is to provide a systematic overview of the advances that have occurred in automated, high throughput image detection of farm animal behavioural traits with welfare and production implications. Peer-reviewed publications written in English were reviewed systematically following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. After identification, screening, and assessment for eligibility, 108 publications met these specifications and were included for qualitative synthesis. Data collected from the papers included camera specifications, housing conditions, group size, algorithm details, procedures, and results. Most studies utilized standard digital colour video cameras for data collection, with increasing use of 3D cameras in papers published after 2013. Papers including pigs (across production stages) were the most common (n = 63). The most common behaviours recorded included activity level, area occupancy, aggression, gait scores, resource use, and posture. Our review revealed many overlaps in methods applied to analysing behaviour, and most studies started from scratch instead of building upon previous work. Training and validation sample sizes were generally small (mean±s.d. groups = 3.8±5.8) and in data collection and testing took place in relatively controlled environments. To advance our ability to automatically phenotype behaviour, future research should build upon existing knowledge and validate technology under commercial settings and publications should explicitly describe recording conditions in detail to allow studies to be reproduced

    Signal processing and machine learning techniques for automatic image-based facial expression recognition

    Get PDF
    PhD ThesisIn this thesis novel signal processing and machine learning techniques are proposed and evaluated for automatic image-based facial expression recognition, which are aimed to progress towards real world operation. A thorough evaluation of the performance of certain image-based expression recognition techniques is performed using a posed database and for the rst time three progressively more challenging spontaneous databases. These methods exploit the principles of sparse representation theory with identity-independent expression recognition using di erence images. The second contribution exploits a low complexity method to extract geometric features from facial expression images. The misalignment problem of the training images is solved and the performance of both geometric and appearance features is assessed on the same three spontaneous databases. A deep network framework that contains auto-encoders is used to form an improved classi er. The nal work focuses upon enhancing the expression recognition performance by the selection and fusion of di erent types of features comprising geometric features and two sorts of appearance features. This provides a rich feature vector by which the best representation of the spontaneous facial features is obtained. Subsequently, the computational complexity is reduced by maintaining important location information by concentrating on the crucial roles of the facial regions as the basic processing instead of the entire face, where the local binary patterns and local phase quantization features are extracted automatically by means of detecting two important regions of the face. Next, an automatic method for splitting the training e ort of the initial network into several networks and multi-classi ers namely a surface network and bottom network are used to solve the problem and to enhance the performance. All methods are evaluated in a MATLAB framework and confusion matrices and average facial expression recognition accuracy are used as the performance metrics.Ministry of Higher Education and Scienti c Research in Iraq (MOHESR

    Patch-based methods for variational image processing problems

    Get PDF
    Image Processing problems are notoriously difficult. To name a few of these difficulties, they are usually ill-posed, involve a huge number of unknowns (from one to several per pixel!), and images cannot be considered as the linear superposition of a few physical sources as they contain many different scales and non-linearities. However, if one considers instead of images as a whole small blocks (or patches) inside the pictures, many of these hurdles vanish and problems become much easier to solve, at the cost of increasing again the dimensionality of the data to process. Following the seminal NL-means algorithm in 2005-2006, methods that consider only the visual correlation between patches and ignore their spatial relationship are called non-local methods. While powerful, it is an arduous task to define non-local methods without using heuristic formulations or complex mathematical frameworks. On the other hand, another powerful property has brought global image processing algorithms one step further: it is the sparsity of images in well chosen representation basis. However, this property is difficult to embed naturally in non-local methods, yielding algorithms that are usually inefficient or circonvoluted. In this thesis, we explore alternative approaches to non-locality, with the goals of i) developing universal approaches that can handle local and non-local constraints and ii) leveraging the qualities of both non-locality and sparsity. For the first point, we will see that embedding the patches of an image into a graph-based framework can yield a simple algorithm that can switch from local to non-local diffusion, which we will apply to the problem of large area image inpainting. For the second point, we will first study a fast patch preselection process that is able to group patches according to their visual content. This preselection operator will then serve as input to a social sparsity enforcing operator that will create sparse groups of jointly sparse patches, thus exploiting all the redundancies present in the data, in a simple mathematical framework. Finally, we will study the problem of reconstructing plausible patches from a few binarized measurements. We will show that this task can be achieved in the case of popular binarized image keypoints descriptors, thus demonstrating a potential privacy issue in mobile visual recognition applications, but also opening a promising way to the design and the construction of a new generation of smart cameras

    An affective computing and image retrieval approach to support diversified and emotion-aware reminiscence therapy sessions

    Get PDF
    A demência é uma das principais causas de dependência e incapacidade entre as pessoas idosas em todo o mundo. A terapia de reminiscência é uma terapia não farmacológica comummente utilizada nos cuidados com demência devido ao seu valor terapêutico para as pessoas com demência. Esta terapia é útil para criar uma comunicação envolvente entre pessoas com demência e o resto do mundo, utilizando as capacidades preservadas da memória a longo prazo, em vez de enfatizar as limitações existentes por forma a aliviar a experiência de fracasso e isolamento social. As soluções tecnológicas de assistência existentes melhoram a terapia de reminiscência ao proporcionar uma experiência mais envolvente para todos os participantes (pessoas com demência, familiares e clínicos), mas não estão livres de lacunas: a) os dados multimédia utilizados permanecem inalterados ao longo das sessões, e há uma falta de personalização para cada pessoa com demência; b) não têm em conta as emoções transmitidas pelos dados multimédia utilizados nem as reacções emocionais da pessoa com demência aos dados multimédia apresentados; c) a perspectiva dos cuidadores ainda não foi totalmente tida em consideração. Para superar estes desafios, seguimos uma abordagem de concepção centrada no utilizador através de inquéritos mundiais, entrevistas de seguimento, e grupos de discussão com cuidadores formais e informais para informar a concepção de soluções tecnológicas no âmbito dos cuidados de demência. Para cumprir com os requisitos identificados, propomos novos métodos que facilitam a inclusão de emoções no loop durante a terapia de reminiscência para personalizar e diversificar o conteúdo das sessões ao longo do tempo. As contribuições desta tese incluem: a) um conjunto de requisitos funcionais validados recolhidos com os cuidadores formais e informais, os resultados esperados com o cumprimento de cada requisito, e um modelo de arquitectura para o desenvolvimento de soluções tecnológicas de assistência para cuidados de demência; b) uma abordagem end-to-end para identificar automaticamente múltiplas informações emocionais transmitidas por imagens; c) uma abordagem para reduzir a quantidade de imagens que precisam ser anotadas pelas pessoas sem comprometer o desempenho dos modelos de reconhecimento; d) uma técnica de fusão tardia interpretável que combina dinamicamente múltiplos sistemas de recuperação de imagens com base em conteúdo para procurar eficazmente por imagens semelhantes para diversificar e personalizar o conjunto de imagens disponíveis para serem utilizadas nas sessões.Dementia is one of the major causes of dependency and disability among elderly subjects worldwide. Reminiscence therapy is an inexpensive non-pharmacological therapy commonly used within dementia care due to its therapeutic value for people with dementia. This therapy is useful to create engaging communication between people with dementia and the rest of the world by using the preserved abilities of long-term memory rather than emphasizing the existing impairments to alleviate the experience of failure and social isolation. Current assistive technological solutions improve reminiscence therapy by providing a more lively and engaging experience to all participants (people with dementia, family members, and clinicians), but they are not free of drawbacks: a) the multimedia data used remains unchanged throughout sessions, and there is a lack of customization for each person with dementia; b) they do not take into account the emotions conveyed by the multimedia data used nor the person with dementia’s emotional reactions to the multimedia presented; c) the caregivers’ perspective have not been fully taken into account yet. To overcome these challenges, we followed a usercentered design approach through worldwide surveys, follow-up interviews, and focus groups with formal and informal caregivers to inform the design of technological solutions within dementia care. To fulfil the requirements identified, we propose novel methods that facilitate the inclusion of emotions in the loop during reminiscence therapy to personalize and diversify the content of the sessions over time. Contributions from this thesis include: a) a set of validated functional requirements gathered from formal and informal caregivers, the expected outcomes with the fulfillment of each requirement, and an architecture’s template for the development of assistive technology solutions for dementia care; b) an end-to-end approach to automatically identify multiple emotional information conveyed by images; c) an approach to reduce the amount of images that need to be annotated by humans without compromising the recognition models’ performance; d) an interpretable late-fusion technique that dynamically combines multiple content-based image retrieval systems to effectively search for similar images to diversify and personalize the pool of images available to be used in sessions

    Quantitative analysis with machine learning models for multi-parametric brain imaging data

    Get PDF
    Gliomas are considered to be the most common primary adult malignant brain tumor. With the dramatic increases in computational power and improvements in image analysis algorithms, computer-aided medical image analysis has been introduced into clinical applications. Precision tumor grading and genotyping play an indispensable role in clinical diagnosis, treatment and prognosis. Gliomas diagnostic procedures include histopathological imaging tests, molecular imaging scans and tumor grading. Pathologic review of tumor morphology in histologic sections is the traditional method for cancer classification and grading, yet human study has limitations that can result in low reproducibility and inter-observer agreement. Compared with histopathological images, Magnetic resonance (MR) imaging present the different structure and functional features, which might serve as noninvasive surrogates for tumor genotypes. Therefore, computer-aided image analysis has been adopted in clinical application, which might partially overcome these shortcomings due to its capacity to quantitatively and reproducibly measure multilevel features on multi-parametric medical information. Imaging features obtained from a single modal image do not fully represent the disease, so quantitative imaging features, including morphological, structural, cellular and molecular level features, derived from multi-modality medical images should be integrated into computer-aided medical image analysis. The image quality differentiation between multi-modality images is a challenge in the field of computer-aided medical image analysis. In this thesis, we aim to integrate the quantitative imaging data obtained from multiple modalities into mathematical models of tumor prediction response to achieve additional insights into practical predictive value. Our major contributions in this thesis are: 1. Firstly, to resolve the imaging quality difference and observer-dependent in histological image diagnosis, we proposed an automated machine-learning brain tumor-grading platform to investigate contributions of multi-parameters from multimodal data including imaging parameters or features from Whole Slide Images (WSI) and the proliferation marker KI-67. For each WSI, we extract both visual parameters such as morphology parameters and sub-visual parameters including first-order and second-order features. A quantitative interpretable machine learning approach (Local Interpretable Model-Agnostic Explanations) was followed to measure the contribution of features for single case. Most grading systems based on machine learning models are considered “black boxes,” whereas with this system the clinically trusted reasoning could be revealed. The quantitative analysis and explanation may assist clinicians to better understand the disease and accordingly to choose optimal treatments for improving clinical outcomes. 2. Based on the automated brain tumor-grading platform we propose, multimodal Magnetic Resonance Images (MRIs) have been introduced in our research. A new imaging–tissue correlation based approach called RA-PA-Thomics was proposed to predict the IDH genotype. Inspired by the concept of image fusion, we integrate multimodal MRIs and the scans of histopathological images for indirect, fast, and cost saving IDH genotyping. The proposed model has been verified by multiple evaluation criteria for the integrated data set and compared to the results in the prior art. The experimental data set includes public data sets and image information from two hospitals. Experimental results indicate that the model provided improves the accuracy of glioma grading and genotyping
    • …
    corecore