2 research outputs found
Síntesis Audiovisual Realista Personalizable
Es presenta un esquema únic per a la síntesi i anàlisi audiovisual personalitzable realista de seqüències audiovisuals de cares parlants i seqüències visuals de llengua de signes en àmbit domèstic. En el primer cas, amb animació totalment sincronitzada a través d'una font de text o veu; en el segon, utilitzant la tècnica de lletrejar paraules mitjançant la ma. Les seves possibilitats de personalització faciliten la creació de seqüències audiovisuals per part d'usuaris no experts. Les aplicacions possibles d'aquest esquema de síntesis comprenen des de la creació de personatges virtuals realistes per interacció natural o vídeo jocs fins vídeo conferència des de molt baix ample de banda i telefonia visual per a les persones amb problemes d'oïda, passant per oferir ajuda a la pronunciació i la comunicació a aquest mateix col·lectiu. El sistema permet processar seqüències llargues amb un consum de recursos molt reduït, sobre tot, en el referent a l'emmagatzematge, gràcies al desenvolupament d'un nou procediment de càlcul incremental per a la descomposició en valors singulars amb actualització de la informació mitja. Aquest procediment es complementa amb altres tres: el decremental, el de partició i el de composició.Se presenta un esquema único para la síntesis y análisis audiovisual personalizable realista de secuencias audiovisuales de caras parlantes y secuencias visuales de lengua de signos en entorno doméstico. En el primer caso, con animación totalmente sincronizada a través de una fuente de texto o voz; en el segundo, utilizando la técnica de deletreo de palabras mediante la mano. Sus posibilidades de personalización facilitan la creación de secuencias audiovisuales por parte de usuarios no expertos. Las aplicaciones posibles de este esquema de síntesis comprenden desde la creación de personajes virtuales realistas para interacción natural o vídeo juegos hasta vídeo conferencia de muy bajo ancho de banda y telefonía visual para las personas con problemas de oído, pasando por ofrecer ayuda en la pronunciación y la comunicación a este mismo colectivo. El sistema permite procesar secuencias largas con un consumo de recursos muy reducido gracias al desarrollo de un nuevo procedimiento de cálculo incremental para la descomposición en valores singulares con actualización de la información media.A shared framework for realistic and personalizable audiovisual synthesis and analysis of audiovisual sequences of talking heads and visual sequences of sign language is presented in a domestic environment. The former has full synchronized animation using a text or auditory source of information; the latter consists in finger spelling. Their personalization capabilities ease the creation of audiovisual sequences by non expert users. The applications range from realistic virtual avatars for natural interaction or videogames to low bandwidth videoconference and visual telephony for the hard of hearing, including help to speech therapists. Long sequences can be processed with reduced resources, specially storing ones. This is allowed thanks to the proposed scheme for the incremental singular value decomposition with mean preservation. This scheme is complemented with another three: the decremental, the split and the composed ones
Recommended from our members
Gestural human-machine interaction using neural networks for people with severe speech and motor impairment due to cerebral palsy
The long-term aim of this research is the development of a robust and appropriate method of high efferent bandwidth gestural human-machine interaction (HMI) that enhances and extends the multimodal expressive abilities of people with severe speech and motor impairment due to cerebral palsy (SSMICP). A human-factors driven approach was adopted to generate and identify candidate behaviour for gestural HMI. Neural methods were applied to investigate the automatic recognition of human move-ment with a high noise component using spastic-athetoid cerebral palsy arm movement data.
Human-machine interaction was considered as an emergent property leading to the development of a methodology based on human-human interaction to elicit a wide range of spontaneous or near spontaneous gestures. Twelve subjects with SSMICP aged five to 18 years took part in a gestural ability pilot study. From 30 to 141 concepts presented verbally were used to elicit a wide range of spontaneous or near spontaneous gestural responses. Subjects were encouraged to express each concept in any way they wished. Frequently gestural ability was beyond that anticipated by therapists, educators, parents and physicians. Therapeutic, educational, and medical records did not predict gestural ability observed in the study. Analysis of video-taped sessions indicated that gestures were frequently articulated using multiple parts of the body. Nine out of ten subjects used either the right or left arm more frequently that any other body part.
Instrumented gestural data comprising a subset of 27 gestures from a 17 years old subject with spastic-athetoid quadriplegia was used to investigate automatic gesture recognition. Co-articulated dynamic arm gestures were elicited in random order and gestural data recorded at 100 samples/second using a six-degree-of-freedom magnetic tracker attached distally to one forearm. The gestural data stream was examined using a simple body
model developed using MATLAB * and animated on a Silicon Graphics Workstation. In the absence of suitable features to automatically segment the gestural data stream, gestures were manually segmented.
Low-pass filtering was used to remove “jerkiness” and data reduction was achieved through re-sampling. The use of time-delay feedforward neural networks was investigated using features extracted over a fixed time interval as input. Neural network classifiers outperformed two k-nearest neighbour methods. Time windows of 160ms to 1120 ms were compared. A span of 640ms comprising four time samples yielded the optimum rate of recognition. Feature sets containing measures of position, forearm orientation, scalar and vector velocity, curvature and plane of motion were compared. A feature set comprising four time intervals of x,y,z position gave highest recognition rate. 12 gestures were recognised at or above 80% with an average recognition rate of 90%. Maximum results for all 26 gestures was 55%. Results suggest that the fixed time window approach coupled with low pass filtering may be a feasible method for the computer recognition of noisy gestural movement. Conversely, the results show that is possible for people classed as having no functional use of upper extremities by traditional assessment techniques to produce a repertoire of dynamic arm gestures with sufficient consistency to be recognised by machine