12 research outputs found

    Intelligent computer vision processing techniques for fall detection in enclosed environments

    Get PDF
    Detecting unusual movement (falls) for elderly people in enclosed environments is receiving increasing attention and is likely to have massive potential social and economic impact. In this thesis, new intelligent computer vision processing based techniques are proposed to detect falls in indoor environments for senior citizens living independently, such as in intelligent homes. Different types of features extracted from video-camera recordings are exploited together with both background subtraction analysis and machine learning techniques. Initially, an improved background subtraction method is used to extract the region of a person in the recording of a room environment. A selective updating technique is introduced for adapting the change of the background model to ensure that the human body region will not be absorbed into the background model when it is static for prolonged periods of time. Since two-dimensional features can generate false alarms and are not invariant to different directions, more robust three-dimensional features are next extracted from a three-dimensional person representation formed from video-camera measurements of multiple calibrated video-cameras. The extracted three-dimensional features are applied to construct a single Gaussian model using the maximum likelihood technique. This can be used to distinguish falls from non-fall activity by comparing the model output with a single. In the final works, new fall detection schemes which use only one uncalibrated video-camera are tested in a real elderly person s home environment. These approaches are based on two-dimensional features which describe different human body posture. The extracted features are applied to construct a supervised method for posture classification for abnormal posture detection. Certain rules which are set according to the characteristics of fall activities are lastly used to build a robust fall detection model

    State of the Art in Face Recognition

    Get PDF
    Notwithstanding the tremendous effort to solve the face recognition problem, it is not possible yet to design a face recognition system with a potential close to human performance. New computer vision and pattern recognition approaches need to be investigated. Even new knowledge and perspectives from different fields like, psychology and neuroscience must be incorporated into the current field of face recognition to design a robust face recognition system. Indeed, many more efforts are required to end up with a human like face recognition system. This book tries to make an effort to reduce the gap between the previous face recognition research state and the future state

    Computer vision for sequential non-invasive microscopy imaging cytometry with applications in embryology

    Get PDF
    Many in vitro cytometric methods requires the sample to be destroyed in the process. Using image analysis of non-invasive microscopy techniques it is possible to monitor samples undisturbed in their natural environment, providing new insights into cell development, morphology and health. As the effect on the sample is minimized, imaging can be sustained for long un-interrupted periods of time, making it possible to study temporal events as well as individual cells over time. These methods are applicable in a number of fields, and are of particular importance in embryological studies, where no sample interference is acceptable. Using long term image capture and digital image cytometry of growing embryos it is possible to perform morphokinetic screening, automated analysis and annotation using proper software tools. By literature reference, one such framework is suggested and the required methods are developed and evaluated. Results are shown in tracking embryos, embryo cell segmentation, analysis of internal cell structures and profiling of cell growth and activity. Two related extensions of the framework into three dimensional embryo analysis and adherent cell monitoring are described

    HIERARCHICAL LEARNING OF DISCRIMINATIVE FEATURES AND CLASSIFIERS FOR LARGE-SCALE VISUAL RECOGNITION

    Get PDF
    Enabling computers to recognize objects present in images has been a long standing but tremendously challenging problem in the field of computer vision for decades. Beyond the difficulties resulting from huge appearance variations, large-scale visual recognition poses unprecedented challenges when the number of visual categories being considered becomes thousands, and the amount of images increases to millions. This dissertation contributes to addressing a number of the challenging issues in large-scale visual recognition. First, we develop an automatic image-text alignment method to collect massive amounts of labeled images from the Web for training visual concept classifiers. Specif- ically, we first crawl a large number of cross-media Web pages containing Web images and their auxiliary texts, and then segment them into a collection of image-text pairs. We then show that near-duplicate image clustering according to visual similarity can significantly reduce the uncertainty on the relatedness of Web images’ semantics to their auxiliary text terms or phrases. Finally, we empirically demonstrate that ran- dom walk over a newly proposed phrase correlation network can help to achieve more precise image-text alignment by refining the relevance scores between Web images and their auxiliary text terms. Second, we propose a visual tree model to reduce the computational complexity of a large-scale visual recognition system by hierarchically organizing and learning the classifiers for a large number of visual categories in a tree structure. Compared to previous tree models, such as the label tree, our visual tree model does not require training a huge amount of classifiers in advance which is computationally expensive. However, we experimentally show that the proposed visual tree achieves results that are comparable or even better to other tree models in terms of recognition accuracy and efficiency. Third, we present a joint dictionary learning (JDL) algorithm which exploits the inter-category visual correlations to learn more discriminative dictionaries for image content representation. Given a group of visually correlated categories, JDL simul- taneously learns one common dictionary and multiple category-specific dictionaries to explicitly separate the shared visual atoms from the category-specific ones. We accordingly develop three classification schemes to make full use of the dictionaries learned by JDL for visual content representation in the task of image categoriza- tion. Experiments on two image data sets which respectively contain 17 and 1,000 categories demonstrate the effectiveness of the proposed algorithm. In the last part of the dissertation, we develop a novel data-driven algorithm to quantitatively characterize the semantic gaps of different visual concepts for learning complexity estimation and inference model selection. The semantic gaps are estimated directly in the visual feature space since the visual feature space is the common space for concept classifier training and automatic concept detection. We show that the quantitative characterization of the semantic gaps helps to automatically select more effective inference models for classifier training, which further improves the recognition accuracy rates

    Speech data analysis for semantic indexing of video of simulated medical crises.

    Get PDF
    The Simulation for Pediatric Assessment, Resuscitation, and Communication (SPARC) group within the Department of Pediatrics at the University of Louisville, was established to enhance the care of children by using simulation based educational methodologies to improve patient safety and strengthen clinician-patient interactions. After each simulation session, the physician must manually review and annotate the recordings and then debrief the trainees. The physician responsible for the simulation has recorded 100s of videos, and is seeking solutions that can automate the process. This dissertation introduces our developed system for efficient segmentation and semantic indexing of videos of medical simulations using machine learning methods. It provides the physician with automated tools to review important sections of the simulation by identifying who spoke, when and what was his/her emotion. Only audio information is extracted and analyzed because the quality of the image recording is low and the visual environment is static for most parts. Our proposed system includes four main components: preprocessing, speaker segmentation, speaker identification, and emotion recognition. The preprocessing consists of first extracting the audio component from the video recording. Then, extracting various low-level audio features to detect and remove silence segments. We investigate and compare two different approaches for this task. The first one is threshold-based and the second one is classification-based. The second main component of the proposed system consists of detecting speaker changing points for the purpose of segmenting the audio stream. We propose two fusion methods for this task. The speaker identification and emotion recognition components of our system are designed to provide users the capability to browse the video and retrieve shots that identify ”who spoke, when, and the speaker’s emotion” for further analysis. For this component, we propose two feature representation methods that map audio segments of arbitary length to a feature vector with fixed dimensions. The first one is based on soft bag-of-word (BoW) feature representations. In particular, we define three types of BoW that are based on crisp, fuzzy, and possibilistic voting. The second feature representation is a generalization of the BoW and is based on Fisher Vector (FV). FV uses the Fisher Kernel principle and combines the benefits of generative and discriminative approaches. The proposed feature representations are used within two learning frameworks. The first one is supervised learning and assumes that a large collection of labeled training data is available. Within this framework, we use standard classifiers including K-nearest neighbor (K-NN), support vector machine (SVM), and Naive Bayes. The second framework is based on semi-supervised learning where only a limited amount of labeled training samples are available. We use an approach that is based on label propagation. Our proposed algorithms were evaluated using 15 medical simulation sessions. The results were analyzed and compared to those obtained using state-of-the-art algorithms. We show that our proposed speech segmentation fusion algorithms and feature mappings outperform existing methods. We also integrated all proposed algorithms and developed a GUI prototype system for subjective evaluation. This prototype processes medical simulation video and provides the user with a visual summary of the different speech segments. It also allows the user to browse videos and retrieve scenes that provide answers to semantic queries such as: who spoke and when; who interrupted who? and what was the emotion of the speaker? The GUI prototype can also provide summary statistics of each simulation video. Examples include: for how long did each person spoke? What is the longest uninterrupted speech segment? Is there an unusual large number of pauses within the speech segment of a given speaker

    The selective use of gaze in automatic speech recognition

    Get PDF
    The performance of automatic speech recognition (ASR) degrades significantly in natural environments compared to in laboratory assessments. Being a major source of interference, acoustic noise affects speech intelligibility during the ASR process. There are two main problems caused by the acoustic noise. The first is the speech signal contamination. The second is the speakers' vocal and non-vocal behavioural changes. These phenomena elicit mismatch between the ASR training and recognition conditions, which leads to considerable performance degradation. To improve noise-robustness, exploiting prior knowledge of the acoustic noise in speech enhancement, feature extraction and recognition models are popular approaches. An alternative approach presented in this thesis is to introduce eye gaze as an extra modality. Eye gaze behaviours have roles in interaction and contain information about cognition and visual attention; not all behaviours are relevant to speech. Therefore, gaze behaviours are used selectively to improve ASR performance. This is achieved by inference procedures using noise-dependant models of gaze behaviours and their temporal and semantic relationship with speech. `Selective gaze-contingent ASR' systems are proposed and evaluated on a corpus of eye movement and related speech in different clean, noisy environments. The best performing systems utilise both acoustic and language model adaptation

    Detecting emotions from speech using machine learning techniques

    Get PDF
    D.Phil. (Electronic Engineering

    Deep learning for texture and dynamic texture analysis

    Get PDF
    Texture is a fundamental visual cue in computer vision which provides useful information about image regions. Dynamic Texture (DT) extends the analysis of texture to sequences of moving scenes. Classic approaches to texture and DT analysis are based on shallow hand-crafted descriptors including local binary patterns and filter banks. Deep learning and in particular Convolutional Neural Networks (CNNs) have significantly contributed to the field of computer vision in the last decade. These biologically inspired networks trained with powerful algorithms have largely improved the state of the art in various tasks such as digit, object and face recognition. This thesis explores the use of CNNs in texture and DT analysis, replacing classic hand-crafted filters by deep trainable filters. An introduction to deep learning is provided in the thesis as well as a thorough review of texture and DT analysis methods. While CNNs present interesting features for the analysis of textures such as a dense extraction of filter responses trained end to end, the deepest layers used in the decision rules commonly learn to detect large shapes and image layout instead of local texture patterns. A CNN architecture is therefore adapted to textures by using an orderless pooling of intermediate layers to discard the overall shape analysis, resulting in a reduced computational cost and improved accuracy. An application to biomedical texture images is proposed in which large tissue images are tiled and combined in a recognition scheme. An approach is also proposed for DT recognition using the developed CNNs on three orthogonal planes to combine spatial and temporal analysis. Finally, a fully convolutional network is adapted to texture segmentation based on the same idea of discarding the overall shape and by combining local shallow features with larger and deeper features

    Automated Change Detection in Privacy Policies

    Get PDF
    Privacy policies notify Internet users about the privacy practices of websites, mobile apps, and other products and services. However, users rarely read them and struggle to understand their contents. Also, the entities that provide these policies are sometimes unmotivated to make them comprehensible. Due to the complicated nature of these documents, it gets even harder for users to understand and take note of any changes of interest or concern when these policies are changed or revised. With recent development of machine learning and natural language processing, tools that can automatically annotate sentences of policies have been developed. These annotations can help a user quickly identify and understand relevant parts of the policy. Similarly a tool can be developed that can help identify changes between different versions of a policy that can be informative for the user. For example, suppose according to the new policy a website will start sharing audio data as well. The proposed tool can help users to be aware of such important changes. This thesis presents a tool that takes two different versions of a privacy policy as input, matches the sentences of one version of a policy to the sentences of another version of the policy based on semantic similarity, and inform the user of key relevant changes between two matched sentences. We discuss different supervised machine learning models that are explored to develop a method to annotate the sentences of privacy policies according to expert-identified categories for organization and analysis of the contents. Different word-embedding and similarity techniques are explored and evaluated to develop a method to match the sentences of one version of the policy to another version of a policy. The annotation of the sentences are used to increase the efficiency of the matching process. Methods to detect changes between two matched sentences through analysis of the structure of sentences are then implemented. We combined the developed methods for annotation of policies, matching the sentences between two versions of a policy and detecting change between sentences to realize the proposed tool. The research work not only shows the potential of machine learning and natural language processing as an important tool for privacy engineering but also introduces various techniques that can be utilized for any natural language document
    corecore