28 research outputs found

    Audio-coupled video content understanding of unconstrained video sequences

    Get PDF
    Unconstrained video understanding is a difficult task. The main aim of this thesis is to recognise the nature of objects, activities and environment in a given video clip using both audio and video information. Traditionally, audio and video information has not been applied together for solving such complex task, and for the first time we propose, develop, implement and test a new framework of multi-modal (audio and video) data analysis for context understanding and labelling of unconstrained videos. The framework relies on feature selection techniques and introduces a novel algorithm (PCFS) that is faster than the well-established SFFS algorithm. We use the framework for studying the benefits of combining audio and video information in a number of different problems. We begin by developing two independent content recognition modules. The first one is based on image sequence analysis alone, and uses a range of colour, shape, texture and statistical features from image regions with a trained classifier to recognise the identity of objects, activities and environment present. The second module uses audio information only, and recognises activities and environment. Both of these approaches are preceded by detailed pre-processing to ensure that correct video segments containing both audio and video content are present, and that the developed system can be made robust to changes in camera movement, illumination, random object behaviour etc. For both audio and video analysis, we use a hierarchical approach of multi-stage classification such that difficult classification tasks can be decomposed into simpler and smaller tasks. When combining both modalities, we compare fusion techniques at different levels of integration and propose a novel algorithm that combines advantages of both feature and decision-level fusion. The analysis is evaluated on a large amount of test data comprising unconstrained videos collected for this work. We finally, propose a decision correction algorithm which shows that further steps towards combining multi-modal classification information effectively with semantic knowledge generates the best possible results

    On the Data Efficiency and Model Complexity of Visual Learning

    Get PDF
    Computer vision is a research field that aims to automate the procedure of gaining abstract understanding from digital images or videos. The recent rapid developments of deep neural networks have demonstrated human-level performance or beyond on many vision tasks that require high-level understanding, such as image recognition, object detection, etc. However, training deep neural networks usually requires large-scale datasets annotated by humans, and the models typically have millions of parameters and consume a lot of computation resources. The issues of data efficiency and model complexity are commonly observed in many frameworks based on deep neural networks, limiting their deployment in real-world applications. In this dissertation, I will present our research works that address the issues of data efficiency and model complexity of deep neural networks. For the data efficiency, (i) we study the problem of few-shot image recognition, where the training datasets are limited to having only a few examples per category. (ii) We also investigate semi-supervised visual learning, which provides unlabeled samples in addition to the annotated dataset and aims to utilize them to learn better models. For the model complexity, (iii) we seek alternatives to cascading layers or blocks for improving the representation capacities of convolutional neural networks without introducing additional computations. (iv) We improve the computational resource utilization of deep neural networks by finding, reallocating, and rejuvenating underutilized neurons. (v) We present two techniques for object detection that reuse computations to reduce the architecture complexity and improve the detection performance. (vi) Finally, we show our work on reusing visual features for multi-task learning to improve computation efficiency and share training information between different tasks

    Identification and Characterization of Ethanol Responsive Genes in Acute Ethanol Behaviors in Caenorhabditis elegans

    Get PDF
    Alcohol abuse and dependence are complex disorders that are influenced by many genetic and environmental factors. Acute behavioral responses to ethanol have predictive value for determining an individual’s long-term susceptibility to alcohol abuse and dependence. These behavioral responses are strongly influenced by genetics. Here, we have explored the role of genetic influences on acute behavioral responses to ethanol using the nematode worm, Caenorhabditis elegans. First, we explored the role of ethanol metabolism in acute behavior responses to ethanol. Natural variation in human ethanol metabolism machinery is one of the most reported and reproducible associations found to alter drinking behavior. Ethanol metabolism is conserved across phyla and alteration in this pathway alters acute behavioral responses to ethanol in humans, mice, rats, and flies. We have extended these findings to the worm and have shown that loss of either alcohol dehydrogenase or aldehyde dehydrogenase results in an increase in sensitivity to the acute effects of ethanol. Second, we explored the influence of differences in basal and ethanol-induced gene expression in ethanol responsive behaviors. We identified a set of candidate genes using the basal gene expression differences in npr-1(ky13) mutant animals to enrich for genes involved in AFT. This analysis revealed ethanol changes to the expression of genes involved in a variety of biological processes including lipid metabolism. We focused on a gene involved in the metabolism of fatty acids, acs-2. acs-2 encodes an acyl-CoA synthetase that activates fatty acids for mitochondrial beta-oxidation. Animals carrying mutant acs-2 have significantly reduced AFT and we explored the role of genes in the mitochondria beta-oxidation pathway for alterations in ethanol responsive behaviors. We have shown that knockdown of ech-6, an enoyl-CoA hydratase, enhances the development of AFT. This work has uncovered a role for fatty acid utilization pathways in acute ethanol responses and we suggest that natural variation in these pathways in humans may impact the acute alcohol responses to alcohol that in turn influence susceptibility to alcohol abuse and dependence

    Usable Security for Wireless Body-Area Networks

    Get PDF
    We expect wireless body-area networks of pervasive wearable devices will enable in situ health monitoring, personal assistance, entertainment personalization, and home automation. As these devices become ubiquitous, we also expect them to interoperate. That is, instead of closed, end-to-end body-worn sensing systems, we envision standardized sensors that wirelessly communicate their data to a device many people already carry today, the smart phone. However, this ubiquity of wireless sensors combined with the characteristics they sense present many security and privacy problems. In this thesis we describe solutions to two of these problems. First, we evaluate the use of bioimpedance for recognizing who is wearing these wireless sensors and show that bioimpedance is a feasible biometric. Second, we investigate the use of accelerometers for verifying whether two of these wireless sensors are on the same person and show that our method is successful as distinguishing between sensors on the same body and on different bodies. We stress that any solution to these problems must be usable, meaning the user should not have to do anything but attach the sensor to their body and have them just work. These methods solve interesting problems in their own right, but it is the combination of these methods that shows their true power. Combined together they allow a network of wireless sensors to cooperate and determine whom they are sensing even though only one of the wireless sensors might be able to determine this fact. If all the wireless sensors know they are on the same body as each other and one of them knows which person it is on, then they can each exploit the transitive relationship to know that they must all be on that person’s body. We show how these methods can work together in a prototype system. This ability to operate unobtrusively, collecting in situ data and labeling it properly without interrupting the wearer’s activities of daily life, will be vital to the success of these wireless sensors

    Audio-coupled video content understanding of unconstrained video sequences

    Get PDF
    Unconstrained video understanding is a difficult task. The main aim of this thesis is to recognise the nature of objects, activities and environment in a given video clip using both audio and video information. Traditionally, audio and video information has not been applied together for solving such complex task, and for the first time we propose, develop, implement and test a new framework of multi-modal (audio and video) data analysis for context understanding and labelling of unconstrained videos. The framework relies on feature selection techniques and introduces a novel algorithm (PCFS) that is faster than the well-established SFFS algorithm. We use the framework for studying the benefits of combining audio and video information in a number of different problems. We begin by developing two independent content recognition modules. The first one is based on image sequence analysis alone, and uses a range of colour, shape, texture and statistical features from image regions with a trained classifier to recognise the identity of objects, activities and environment present. The second module uses audio information only, and recognises activities and environment. Both of these approaches are preceded by detailed pre-processing to ensure that correct video segments containing both audio and video content are present, and that the developed system can be made robust to changes in camera movement, illumination, random object behaviour etc. For both audio and video analysis, we use a hierarchical approach of multi-stage classification such that difficult classification tasks can be decomposed into simpler and smaller tasks. When combining both modalities, we compare fusion techniques at different levels of integration and propose a novel algorithm that combines advantages of both feature and decision-level fusion. The analysis is evaluated on a large amount of test data comprising unconstrained videos collected for this work. We finally, propose a decision correction algorithm which shows that further steps towards combining multi-modal classification information effectively with semantic knowledge generates the best possible results.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    LUCKY DCT AGGREGATION FOR CAMERA SHAKE REMOVAL

    No full text
    We consider the task of removing the effect of camera shake during a long exposure. Technically, this is a blind deconvolution problem in which both the image and the motion blur have to be jointly inferred. Several algorithms have been proposed till date for removing camera shake that work with one or more images. However, most of these algorithms are computationally expensive and hence cannot be used in real-time. In this work, we propose a simple and cheap algorithm that can effectively recover the original sharp image from multiple burst images (captured using the burst modality of modern cameras). In summary, we pick selected images from the burst (using ideas from lucky imaging), which are then aggregated using the discrete cosine transform (similar to the idea of Fourier burst accumulation). We present some preliminary results and comparisons to demonstrate the effectiveness of the proposal

    Bowdoin Orient v.79, no.1-26 (1949-1950)

    Get PDF
    https://digitalcommons.bowdoin.edu/bowdoinorient-1950s/1000/thumbnail.jp

    Addressing subjectivity in the classification of palaeoenvironmental remains with supervised deep learning convolutional neural networks

    Get PDF
    Archaeological object identifications have been traditionally undertaken through a comparative methodology where each artefact is identified through a subjective, interpretative act by a professional. Regarding palaeoenvironmental remains, this comparative methodology is given boundaries by using reference materials and codified sets of rules, but subjectivity is nevertheless present. The problem with this traditional archaeological methodology is that higher level of subjectivity in the identification of artefacts leads to inaccuracies, which then increases the potential for Type I and Type II errors in the testing of hypotheses. Reducing the subjectivity of archaeological identifications would improve the statistical power of archaeological analyses, which would subsequently lead to more impactful research. In this thesis, it is shown that the level of subjectivity in palaeoenvironmental research can be reduced by applying deep learning convolutional neural networks within an image recognition framework. The primary aim of the presented research is therefore to further the on-going paradigm shift in archaeology towards model-based object identifications, particularly within the realm of palaeoenvironmental remains. Although this thesis focuses on the identification of pollen grains and animal bones, with the latter being restricted to the astragalus of sheep and goats, there are wider implications for archaeology as these methods can easily be extended beyond pollen and animal remains. The previously published POLEN23E dataset is used as the pilot study of applying deep learning in pollen grain classification. In contrast, an image dataset of modern bones was compiled for the classification of sheep and goat astragali due to a complete lack of available bone image datasets and a double blind study with inexperienced and experienced zooarchaeologists was performed to have a benchmark to which image recognition models can be compared. In both classification tasks, the presented models outperform all previous formal modelling methods and only the best human analysts match the performance of the deep learning model in the sheep and goat astragalus separation task. Throughout the thesis, there is a specific focus on increasing trust in the models through the visualization of the models’ decision making and avenues of improvements to Grad-CAM are explored. This thesis makes an explicit case for the phasing out of the comparative methods in favour of a formal modelling framework within archaeology, especially in palaeoenvironmental object identification

    Bowdoin Orient v.71, no.1-26 (1941-1942)

    Get PDF
    https://digitalcommons.bowdoin.edu/bowdoinorient-1940s/1002/thumbnail.jp
    corecore