Search CORE

8 research outputs found

Extended SRC: Undersampled Face Recognition via Intraclass Variant Dictionary

Author: Jiani Hu
Jun Guo
Weihong Deng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date
Field of study

Robust Modeling of Epistemic Mental States and Their Applications in Assistive Technology

Author: Rahman A K M Mahbubur
Publication venue: University of Memphis Digital Commons
Publication date: 03/12/2013
Field of study

This dissertation presents the design and implementation of EmoAssist: Emotion-Enabled Assistive Tool to Enhance Dyadic Conversation for the Blind . The key functionalities of the system are to recognize behavioral expressions and to predict 3-D affective dimensions from visual cues and to provide audio feedback to the visually impaired in a natural environment. Prior to describing the EmoAssist, this dissertation identifies and advances research challenges in the analysis of the facial features and their temporal dynamics with Epistemic Mental States in dyadic conversation. A number of statistical analyses and simulations were performed to get the answer of important research questions about the complex interplay between facial features and mental states. It was found that the non-linear relations are mostly prevalent rather than the linear ones. Further, the portable prototype of assistive technology that can aid blind individual to understand his/her interlocutor\u27s mental states has been designed based on the analysis. A number of challenges related to the system, communication protocols, error-free tracking of face and robust modeling of behavioral expressions /affective dimensions were addressed to make the EmoAssist effective in a real world scenario. In addition, orientation-sensor information from the phone was used to correct image alignment to improve the robustness in real life deployment. It was observed that the EmoAssist can predict affective dimensions with acceptable accuracy (Maximum Correlation-Coefficient for valence: 0.76, arousal: 0.78, and dominance: 0.76) in natural conversation. The overall minimum and maximum response-times are (64.61 milliseconds) and (128.22 milliseconds), respectively. The integration of sensor information for correcting the orientation has helped in significant improvement (16% in average) of accuracy in recognizing behavioral expressions. A user study with ten blind people shows that the EmoAssist is highly acceptable to them (Average acceptability rating using Likert: 6.0 where 1 and 7 are the lowest and highest possible ratings, respectively) in social interaction

University of Memphis Digital Commons

Exploiting Sparsity for Registration of Brain Tumor MR Images

Author: García Pich Xavier
Publication venue: Universitat Politècnica de Catalunya
Publication date: 10/09/2014
Field of study

[ANGLÈS] In medical imaging, sparsity has been used in the acquisition and reconstruction of MRI images, image denoising and face recognition among others. The aim of this thesis is to assess whether exploiting sparsity is a desirable property in the problem of brain tumor image registration. To this end, we consider tumor mass effect and tumor infiltration as two different tumor growing effects. In intensity-based nonrigid image registration, an optimization problem is defined by the minimization of a cost function with respect to the transformation parameters. This cost function consists of a dissimilarity term between the images being registered and a term that regularizes the transformation. Within this thesis, a modified L1 norm dissimilarity measure and a modified L1 regularization term are constructed. We compare the performance of different algorithms that combine these contributions with an L2 norm dissimilarity measure and diffusion regularizer for three different transformation models. Methods are tested on simulated brain tumor MR images and the validation of the registration is done by computing two dissimilarity distances between the deformation field obtained and a simulated ground truth. Results show that algorithms that use the modified L1 regularizer and a L2 dissimilarity measure recover the deformation of the tumor, while algorithms that use the modified L1 norm dissimilarity measure in some situations do not.[CASTELLÀ] En el procesado de imágenes médicas, se ha utilizado la 'sparsity' en la adquisición y la reconstrucción de imágenes MRI, en la eliminación del ruido en imágenes y en el reconocimiento facial, entre otros. El objetivo de la tesis es avaluar si el uso de la 'sparsity' es una propiedad deseada en el registro de imágenes de tumores cerebrales. Con esta finalidad, en esta tesis se consideran el efecto de masa tumoral i la infiltración de tumores como dos efectos diferentes del crecimiento del tumor. En el registro de imágenes no rígidas basadas en intensidad, un problema de optimización tiene que ser definido vía la minimización de una función de coste con respecto a los parámetros de la transformación. Esta función de coste consiste en un término de similitud entre las imágenes que se quieren registrar y un término adicional que regulariza la transformación. Dentro de esta tesis, se construyen un término de disimilitud norma L1-modificada y un término de regularización L1-modificada. Comparamos el rendimiento de diferentes algoritmos que combinan la contribución de estos dos términos, junto con el término de disimilitud norma L2 y el término de regularización de difusión, para tres posibles modelos de transformación diferentes. Los métodos se prueban en imágenes MR simuladas de tumores cerebrales y la validación de los registros es vía el cálculo de dos distancias de disimilitud entre el campo de deformación obtenido tras el registro y un 'ground truth' simulado. Los resultados muestran que los algoritmos que utilizan la regularización L1-modificada y la disimilitud norma L2 recuperan la transformación del tumor. Mientras que algoritmos que utilizan la norma L1-modificada como término de disimilitud tan solo lo hacen en situaciones puntuales.[CATALÀ] En el processament d'imatges mèdiques, s'ha utilitzat la 'sparsity' en l'adquisició i reconstrucció d'imatges MRI, en l'eliminació del soroll d'imatges i en el reconeixement de cares entre altres. L'objectiu d'aquesta tesi és avaluar si l'explotació de la 'sparsity' és una propietat desitjable en el registre d'imatges del tumor cerebral. Amb aquesta finalitat, en aquesta tesi es consideren l'efecte de massa tumoral i l'infiltració de tumor com a dos efectes de creixement de tumors diferents. En el registre d'imatges no rígides basades en l'intensitat, un problema d'optimització s'ha de definir via la minimització d'una funció de cost, amb respecte els paràmetres de la transformació. Aquesta funció de cost consisteix amb un terme de dissimilitud entre les imatges que són registrades, i un terme addicional que regularitza la transformació. Dins d'aquesta tesi, es construeixen el terme de dissimilitud norma L1-modificada i el terme de regularització L1-modificat. Comparem el rendiment de differents algoritmes que combinen la contribució d'aquests dos termes amb el terme de dissimilitud norma L2 i el terme de regularització de difusió, per a tres models de transformacions diferents. Els mètodes són provats amb imatges MR simulades de tumors cerebrals i la validació dels registres és via el càlcul de dues distàncies de dissimilitud entre el camp de deformació obtingut i un 'ground truth' simulat. Els resultats mostren que els algoritmes que utilitzen la regularització L1-modificat i la dissimilitud norma L2 recuperen la transformació del tumor. Mentre que algoritmes que utilitzen la norma L1-modificada com a mesura de dissimilitud tan sols ho fan en situacions puntuals

UPCommons. Portal del coneixement obert de la UPC

Robust subspace learning for static and dynamic affect and behaviour modelling

Author: Georgakis Christos
Publication venue: Computing, Imperial College London
Publication date: 01/09/2017
Field of study

Machine analysis of human affect and behavior in naturalistic contexts has witnessed a growing attention in the last decade from various disciplines ranging from social and cognitive sciences to machine learning and computer vision. Endowing machines with the ability to seamlessly detect, analyze, model, predict as well as simulate and synthesize manifestations of internal emotional and behavioral states in real-world data is deemed essential for the deployment of next-generation, emotionally- and socially-competent human-centered interfaces. In this thesis, we are primarily motivated by the problem of modeling, recognizing and predicting spontaneous expressions of non-verbal human affect and behavior manifested through either low-level facial attributes in static images or high-level semantic events in image sequences. Both visual data and annotations of naturalistic affect and behavior naturally contain noisy measurements of unbounded magnitude at random locations, commonly referred to as ‘outliers’. We present here machine learning methods that are robust to such gross, sparse noise. First, we deal with static analysis of face images, viewing the latter as a superposition of mutually-incoherent, low-complexity components corresponding to facial attributes, such as facial identity, expressions and activation of atomic facial muscle actions. We develop a robust, discriminant dictionary learning framework to extract these components from grossly corrupted training data and combine it with sparse representation to recognize the associated attributes. We demonstrate that our framework can jointly address interrelated classification tasks such as face and facial expression recognition. Inspired by the well-documented importance of the temporal aspect in perceiving affect and behavior, we direct the bulk of our research efforts into continuous-time modeling of dimensional affect and social behavior. Having identified a gap in the literature which is the lack of data containing annotations of social attitudes in continuous time and scale, we first curate a new audio-visual database of multi-party conversations from political debates annotated frame-by-frame in terms of real-valued conflict intensity and use it to conduct the first study on continuous-time conflict intensity estimation. Our experimental findings corroborate previous evidence indicating the inability of existing classifiers in capturing the hidden temporal structures of affective and behavioral displays. We present here a novel dynamic behavior analysis framework which models temporal dynamics in an explicit way, based on the natural assumption that continuous- time annotations of smoothly-varying affect or behavior can be viewed as outputs of a low-complexity linear dynamical system when behavioral cues (features) act as system inputs. A novel robust structured rank minimization framework is proposed to estimate the system parameters in the presence of gross corruptions and partially missing data. Experiments on prediction of dimensional conflict and affect as well as multi-object tracking from detection validate the effectiveness of our predictive framework and demonstrate that for the first time that complex human behavior and affect can be learned and predicted based on small training sets of person(s)-specific observations.Open Acces

Spiral - Imperial College Digital Repository

Higher dimensional time-energy entanglement

Author: Lampert Richart Daniel
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 08/07/2014
Field of study

Judging by the compelling number of innovations based on taming quantum mechanical effects, such as the development of transistors and lasers, further research in this field promises to tackle further technological challenges in the years to come. This statement gains even more importance in the information processing scenario. Here, the growing data generation and the correspondingly higher need for more efficient computational resources and secure high bandwidth networks are central problems which need to be tackled. In this sense, the required CPU minituarization makes the design of structures at atomic levels inevitable, as foreseen by Moore's law. From these perspectives, it is necessary to concentrate further research efforts into controlling and manipulating quantum mechanical systems. This enables for example to encode quantum superposition states to tackle problems which are computationally NP hard and which therefore cannot be solved efficiently by classical computers. The only limitation affecting these solutions is the low scalability of existing quantum systems. Similarly, quantum communication schemes are devised to certify the secure transmission of quantum information, but are still limited by a low transmission bandwidth. This thesis follows the guideline defined by these research projects and aims to further increase the scalability of the quantum mechanical systems required to perform these tasks. The method used here is to encode quantum states into photons generated by spontaneous parametric down-conversion (SPDC). An intrinsic limitation of photons is that the scalability of quantum information schemes employing them is limited by the low detection efficiency of commercial single photon detectors. This is addressed by encoding higher dimensional quantum states into two photons, increasing the scalability of the scheme in comparison to multi-photon states. Further on, the encoding of quantum information into the emission-time degree of freedom improves its applicability to long distance quantum communication schemes. By doing that, the intrinsic limitations of other schemes based on the encoding into the momentum and polarization degree of freedom are overcome. This work presents results on a scalable experimental implementation of time-energy encoded higher dimensional states, demonstrating the feasibility of the scheme. Further tools are defined and used to characterize the properties of the prepared quantum states, such as their entanglement, their dimension and their preparation fidelity. Finally, the method of quantum state tomography is used to fully determine the underlying quantum states at the cost of an increased measurement effort and thus operation time. It is at this point that results obtained from the research field of compressed sensing help to decrease the necessary number of measurements. This scheme is compared with an adaptive tomography scheme designed to offer an additional reconstruction speedup. These results display the scalability of the scheme to bipartite dimensions higher than 2x8, equivalent to the encoding of quantum information into more than 6 qubits.Es ist in den letzten Jahren immer deutlicher geworden, dass weitere Forschung zur Untersuchung von quantenmechanischen Systemen durchgeführt werden muss um die wachsenden Probleme in der heutigen Informationstechnologie zu adressieren. Insbesondere sticht hier die exponentiell wachsende Nachfrage nach Computerressourcen und nach sicheren Kommunikationsprotokollen mit hoher Bandbreite hervor, um der weiter wachsenden Datengenerationsrate standzuhalten. Dies stösst auf fundamentale Grenzen, wie die erforderliche Miniaturisierung von Prozessorstrukturen (CPUs) auf atomare Dimensionen demonstriert. Von dieser Perspektive her ist es erforderlich weitere Forschung zur Kontrolle und Manipulation von Quantenzuständen durchzuführen, wie sie zum Beispiel im Feld der Quanteninformation erfolgt ist. Diese Strategie ermöglicht es von weiteren Eigenschaften der Quantenmechanik, wie zum Beispiel der Präparation von Superpositionszuständen, Gebrauch zu machen. Dies ist insbesondere relevant, da es ermöglicht NP harte Probleme zu lösen, die durch klassische Computer nicht effizient gelöst werden können. Allerdings sind bisher experimentell realisierte quantenmechanische Systeme noch nicht skalierbar genug um den Anforderungen der klassischen Technologie gerecht zu werden. Ähnlichen Argumenten folgend sind Quantenkommunikationssysteme, die die Sicherheit von Kommunikationsprotokolle zertifizieren können, noch nicht in der Lage angemessene Bandbreiten zu gewährleisten. Diese Doktorarbeit gliedert sich diesen Forschungsprojekten an, mit dem Ziel die Skalierbarkeit von quantenmechanischen Systemen zu vergrössern und entsprechend den genannten Anforderungen gerecht zu machen. Die Strategie die hier verfolgt wird basiert auf die Kodierung von Quantenzuständen in Photonenpaare, die durch den Prozess der Spontanen Parametrischen Down-conversion (SPDC) erzeugt werden. Dieses Verfahren bringt allerdings eine limitierte Skalierbarkeit der Quantensysteme mit sich, da die Detektionseffizienz von kommerziell erhältlichen Einzelphotonendetektoren limitiert ist. Dieses Problem wird in dieser Arbeit umgangen indem die Quantenzustände in höher dimensionale Hilberträume eines Zweiphotonenzustands kodiert werden, was einen deutlichen Vorteil gegenüber der Kodierung in einen Mehrphotonenzustand darstellt. Darüber hinaus ermöglicht die Kodierung der Quantenzustände in den Emissionszeit Freiheitsgrad der Photonen intrinsische Vorteile bei ihrer Anwendung auf die Quantenkommunikation. Hier ist insbesondere der Vorteil gegenüber der Kodierung in den Impuls- und Polarisationsfreiheitsgrad gemeint, die durch deutliche Einschränkungen bei der Transmission über lange Strecken gekennzeichnet sind. Mit einem Augenmerk auf diese Ziele wird in dieser Arbeit die experimentelle Umsetzbarkeit des beschriebenen Schemas gezeigt. Dies wurde durch die Anwendung von geeigneten Maßen wie die Verschränkung, Dimension und Präparationsfidelity auf die generierten Zustände quantifiziert. Insbesondere bei der Abschätzung der Fidelity wurde von Forschungsergebnissen rund um Compressed Sensing Gebrauch gemacht und weiter mit einem adaptiven Messschema kombiniert, um die effektive Betriebszeit dieser Systeme zu verringern. Dies ist für die weitere skalierbare Anwendung zur Quanteninformationsverarbeitung von Vorteil. Die Ergebnisse verdeutlichen, dass eine Skalierbarkeit der Dimension des Systems auf grösser als 2x8 Dimensionen, äquivalent zur Dimension eines 6-Qubit Zustands, in der Reichweite einer experimentellen Umsetzung liegt

Higher dimensional time-energy entanglement

Author: Lampert Richart Daniel
Publication venue: Ludwig-Maximilians-Universität München
Publication date: 01/01/2014
Field of study

Digitale Hochschulschriften der LMU

MPG.PuRe

Voice Modeling Methods for Automatic Speaker Recognition

Author: Stadelmann Thilo
Publication venue: Philipps-Universität Marburg
Publication date: 01/01/2010
Field of study

Building a voice model means to capture the characteristics of a speaker´s voice in a data structure. This data structure is then used by a computer for further processing, such as comparison with other voices. Voice modeling is a vital step in the process of automatic speaker recognition that itself is the foundation of several applied technologies: (a) biometric authentication, (b) speech recognition and (c) multimedia indexing. Several challenges arise in the context of automatic speaker recognition. First, there is the problem of data shortage, i.e., the unavailability of sufficiently long utterances for speaker recognition. It stems from the fact that the speech signal conveys different aspects of the sound in a single, one-dimensional time series: linguistic (what is said?), prosodic (how is it said?), individual (who said it?), locational (where is the speaker?) and emotional features of the speech sound itself (to name a few) are contained in the speech signal, as well as acoustic background information. To analyze a specific aspect of the sound regardless of the other aspects, analysis methods have to be applied to a specific time scale (length) of the signal in which this aspect stands out of the rest. For example, linguistic information (i.e., which phone or syllable has been uttered?) is found in very short time spans of only milliseconds of length. On the contrary, speakerspecific information emerges the better the longer the analyzed sound is. Long utterances, however, are not always available for analysis. Second, the speech signal is easily corrupted by background sound sources (noise, such as music or sound effects). Their characteristics tend to dominate a voice model, if present, such that model comparison might then be mainly due to background features instead of speaker characteristics. Current automatic speaker recognition works well under relatively constrained circumstances, such as studio recordings, or when prior knowledge on the number and identity of occurring speakers is available. Under more adverse conditions, such as in feature films or amateur material on the web, the achieved speaker recognition scores drop below a rate that is acceptable for an end user or for further processing. For example, the typical speaker turn duration of only one second and the sound effect background in cinematic movies render most current automatic analysis techniques useless. In this thesis, methods for voice modeling that are robust with respect to short utterances and background noise are presented. The aim is to facilitate movie analysis with respect to occurring speakers. Therefore, algorithmic improvements are suggested that (a) improve the modeling of very short utterances, (b) facilitate voice model building even in the case of severe background noise and (c) allow for efficient voice model comparison to support the indexing of large multimedia archives. The proposed methods improve the state of the art in terms of recognition rate and computational efficiency. Going beyond selective algorithmic improvements, subsequent chapters also investigate the question of what is lacking in principle in current voice modeling methods. By reporting on a study with human probands, it is shown that the exclusion of time coherence information from a voice model induces an artificial upper bound on the recognition accuracy of automatic analysis methods. A proof-of-concept implementation confirms the usefulness of exploiting this kind of information by halving the error rate. This result questions the general speaker modeling paradigm of the last two decades and presents a promising new way. The approach taken to arrive at the previous results is based on a novel methodology of algorithm design and development called “eidetic design". It uses a human-in-the-loop technique that analyses existing algorithms in terms of their abstract intermediate results. The aim is to detect flaws or failures in them intuitively and to suggest solutions. The intermediate results often consist of large matrices of numbers whose meaning is not clear to a human observer. Therefore, the core of the approach is to transform them to a suitable domain of perception (such as, e.g., the auditory domain of speech sounds in case of speech feature vectors) where their content, meaning and flaws are intuitively clear to the human designer. This methodology is formalized, and the corresponding workflow is explicated by several use cases. Finally, the use of the proposed methods in video analysis and retrieval are presented. This shows the applicability of the developed methods and the companying software library sclib by means of improved results using a multimodal analysis approach. The sclib´s source code is available to the public upon request to the author. A summary of the contributions together with an outlook to short- and long-term future work concludes this thesis

Publikations- und Dokumentenserver der Universitätsbibliothek Marburg

Face recognition breakthrough

Author: Kirk L. Kroeker
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref