445 research outputs found

    Spatiotemporal Facial Super-Pixels for Pain Detection

    Get PDF

    Vedel-objektiiv abil salvestatud kaugseire piltide analüüs kasutades super-resolutsiooni meetodeid

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publikatsiooneKäesolevas doktoritöös uuriti nii riist- kui ka tarkvaralisi lahendusi piltide töötlemiseks. Riist¬varalise poole pealt pakuti lahenduseks uudset vedelläätse, milles on dielekt¬rilisest elastomeerist kihilise täituriga membraan otse optilisel teljel. Doktoritöö käigus arendati välja kaks prototüüpi kahe erineva dielektrilisest elastomeerist ki¬hilise täituriga, mille aktiivne ala oli ühel juhul 40 ja teisel 20 mm. Läätse töö vas¬tas elastomeeri deformatsiooni mehaanikale ja suhtelistele muutustele fookuskau¬guses. Muutuste demonstreerimiseks meniskis ja läätse fookuskauguse mõõtmiseks kasutati laserkiirt. Katseandmetest selgub, et muutuste tekitamiseks on vajalik pinge vahemikus 50 kuni 750 volti. Tarkvaralise poole pealt pakuti uut satelliitpiltide parandamise süsteemi. Paku¬tud süsteem jagas mürase sisendpildi DT-CWT laineteisenduse abil mitmeteks sagedusalamribadeks. Pärast müra eemaldamist LA-BSF funktsiooni abil suu¬rendati pildi resolutsiooni DWT-ga ja kõrgsagedusliku alamriba piltide interpo¬leerimisega. Interpoleerimise faktor algsele pildile oli pool sellest, mida kasutati kõrgsagedusliku alamriba piltide interpoleerimisel ning superresolutsiooniga pilt rekonst¬rueeriti IDWT abil. Käesolevas doktoritöös pakuti tarkvaraliseks lahenduseks uudset sõnastiku baasil töötavat super-resolutsiooni (SR) meetodit, milles luuakse paarid suure resolutsiooniga (HR) ja madala resolut-siooniga (LR) piltidest. Kõigepealt jagati vastava sõnastiku loomiseks HR ja LR paarid omakorda osadeks. Esialgse HR kujutise saamiseks LR sisendpildist kombineeriti HR osi. HR osad valiti sõnastikust nii, et neile vastavad LR osad oleksid võimalikult lähedased sisendiks olevale LR pil¬dile. Iga valitud HR osa heledust korrigeeriti, et vähendada kõrvuti asuvate osade heleduse erine¬vusi superresolutsiooniga pildil. Plokkide efekti vähendamiseks ar¬vutati saadud SR pildi keskmine ning bikuupinterpolatsiooni pilt. Lisaks pakuti käesolevas doktoritöös välja kernelid, mille tulemusel on võimalik saadud SR pilte teravamaks muuta. Pakutud kernelite tõhususe tõestamiseks kasutati [83] ja [50] poolt pakutud resolutsiooni parandamise meetodeid. Superreso¬lutsiooniga pilt saadi iga kerneli tehtud HR pildi kombineerimise teel alpha blen¬dingu meetodit kasutades. Pakutud meetodeid ja kerneleid võrreldi erinevate tavaliste ja kaasaegsete meetoditega. Kvantita-tiivsetest katseandmetest ja saadud piltide kvaliteedi visuaal¬sest hindamisest selgus, et pakutud meetodid on tavaliste kaasaegsete meetoditega võrreldes paremad.In this thesis, a study of both hardware and software solutions for image enhance¬ment has been done. On the hardware side, a new liquid lens design with a DESA membrane located directly in the optical path has been demonstrated. Two pro¬totypes with two different DESA, which have a 40 and 20 mm active area in diameter, were developed. The lens performance was consistent with the mechan¬ics of elastomer deformation and relative focal length changes. A laser beam was used to show the change in the meniscus and to measure the focal length of the lens. The experimental results demonstrate that voltage in the range of 50 to 750 V is required to create change in the meniscus. On the software side, a new satellite image enhancement system was proposed. The proposed technique decomposed the noisy input image into various frequency subbands by using DT-CWT. After removing the noise by applying the LA-BSF technique, its resolution was enhanced by employing DWT and interpolating the high-frequency subband images. An original image was interpolated with half of the interpolation factor used for interpolating the high-frequency subband images, and the super-resolved image was reconstructed by using IDWT. A novel single-image SR method based on a generating dictionary from pairs of HR and their corresponding LR images was proposed. Firstly, HR and LR pairs were divided into patches in order to make HR and LR dictionaries respectively. The initial HR representation of an input LR image was calculated by combining the HR patches. These HR patches are chosen from the HR dictionary corre-sponding to the LR patches that have the closest distance to the patches of the in¬put LR image. Each selected HR patch was processed further by passing through an illumination enhancement processing order to reduce the noticeable change of illumination between neighbor patches in the super-resolved image. In order to reduce the blocking effect, the average of the obtained SR image and the bicubic interpolated image was calculated. The new kernels for sampling have also been proposed. The kernels can improve the SR by resulting in a sharper image. In order to demonstrate the effectiveness of the proposed kernels, the techniques from [83] and [50] for resolution enhance¬ment were adopted. The super-resolved image was achieved by combining the HR images produced by each of the proposed kernels using the alpha blending tech-nique. The proposed techniques and kernels are compared with various conventional and state-of-the-art techniques, and the quantitative test results and visual results on the final image quality show the superiority of the proposed techniques and ker¬nels over conventional and state-of-art technique

    Action recognition using single-pixel time-of-flight detection

    Get PDF
    Action recognition is a challenging task that plays an important role in many robotic systems, which highly depend on visual input feeds. However, due to privacy concerns, it is important to find a method which can recognise actions without using visual feed. In this paper, we propose a concept for detecting actions while preserving the test subject's privacy. Our proposed method relies only on recording the temporal evolution of light pulses scattered back from the scene. Such data trace to record one action contains a sequence of one-dimensional arrays of voltage values acquired by a single-pixel detector at 1 GHz repetition rate. Information about both the distance to the object and its shape are embedded in the traces. We apply machine learning in the form of recurrent neural networks for data analysis and demonstrate successful action recognition. The experimental results show that our proposed method could achieve on average 96.47 % accuracy on the actions walking forward, walking backwards, sitting down, standing up and waving hand, using recurrent neural network

    Image-set, Temporal and Spatiotemporal Representations of Videos for Recognizing, Localizing and Quantifying Actions

    Get PDF
    This dissertation addresses the problem of learning video representations, which is defined here as transforming the video so that its essential structure is made more visible or accessible for action recognition and quantification. In the literature, a video can be represented by a set of images, by modeling motion or temporal dynamics, and by a 3D graph with pixels as nodes. This dissertation contributes in proposing a set of models to localize, track, segment, recognize and assess actions such as (1) image-set models via aggregating subset features given by regularizing normalized CNNs, (2) image-set models via inter-frame principal recovery and sparsely coding residual actions, (3) temporally local models with spatially global motion estimated by robust feature matching and local motion estimated by action detection with motion model added, (4) spatiotemporal models 3D graph and 3D CNN to model time as a space dimension, (5) supervised hashing by jointly learning embedding and quantization, respectively. State-of-the-art performances are achieved for tasks such as quantifying facial pain and human diving. Primary conclusions of this dissertation are categorized as follows: (i) Image set can capture facial actions that are about collective representation; (ii) Sparse and low-rank representations can have the expression, identity and pose cues untangled and can be learned via an image-set model and also a linear model; (iii) Norm is related with recognizability; similarity metrics and loss functions matter; (v) Combining the MIL based boosting tracker with the Particle Filter motion model induces a good trade-off between the appearance similarity and motion consistence; (iv) Segmenting object locally makes it amenable to assign shape priors; it is feasible to learn knowledge such as shape priors online from Web data with weak supervision; (v) It works locally in both space and time to represent videos as 3D graphs; 3D CNNs work effectively when inputted with temporally meaningful clips; (vi) the rich labeled images or videos help to learn better hash functions after learning binary embedded codes than the random projections. In addition, models proposed for videos can be adapted to other sequential images such as volumetric medical images which are not included in this dissertation

    Facial affect "in the wild": a survey and a new database

    Get PDF
    Well-established databases and benchmarks have been developed in the past 20 years for automatic facial behaviour analysis. Nevertheless, for some important problems regarding analysis of facial behaviour, such as (a) estimation of affect in a continuous dimensional space (e.g., valence and arousal) in videos displaying spontaneous facial behaviour and (b) detection of the activated facial muscles (i.e., facial action unit detection), to the best of our knowledge, well-established in-the-wild databases and benchmarks do not exist. That is, the majority of the publicly available corpora for the above tasks contain samples that have been captured in controlled recording conditions and/or captured under a very specific milieu. Arguably, in order to make further progress in automatic understanding of facial behaviour, datasets that have been captured in in the-wild and in various milieus have to be developed. In this paper, we survey the progress that has been recently made on understanding facial behaviour in-the-wild, the datasets that have been developed so far and the methodologies that have been developed, paying particular attention to deep learning techniques for the task. Finally, we make a significant step further and propose a new comprehensive benchmark for training methodologies, as well as assessing the performance of facial affect/behaviour analysis/ understanding in-the-wild. To the best of our knowledge, this is the first time that such a benchmark for valence and arousal "in-the-wild" is presente
    corecore