2,578 research outputs found

    Robotics 2010

    Get PDF
    Without a doubt, robotics has made an incredible progress over the last decades. The vision of developing, designing and creating technical systems that help humans to achieve hard and complex tasks, has intelligently led to an incredible variety of solutions. There are barely technical fields that could exhibit more interdisciplinary interconnections like robotics. This fact is generated by highly complex challenges imposed by robotic systems, especially the requirement on intelligent and autonomous operation. This book tries to give an insight into the evolutionary process that takes place in robotics. It provides articles covering a wide range of this exciting area. The progress of technical challenges and concepts may illuminate the relationship between developments that seem to be completely different at first sight. The robotics remains an exciting scientific and engineering field. The community looks optimistically ahead and also looks forward for the future challenges and new development

    Visualizing structures of speech expressiveness

    Get PDF

    Multimodaalsel emotsioonide tuvastamisel põhineva inimese-roboti suhtluse arendamine

    Get PDF
    Väitekirja elektrooniline versioon ei sisalda publikatsiooneÜks afektiivse arvutiteaduse peamistest huviobjektidest on mitmemodaalne emotsioonituvastus, mis leiab rakendust peamiselt inimese-arvuti interaktsioonis. Emotsiooni äratundmiseks uuritakse nendes süsteemides nii inimese näoilmeid kui kakõnet. Käesolevas töös uuritakse inimese emotsioonide ja nende avaldumise visuaalseid ja akustilisi tunnuseid, et töötada välja automaatne multimodaalne emotsioonituvastussüsteem. Kõnest arvutatakse mel-sageduse kepstri kordajad, helisignaali erinevate komponentide energiad ja prosoodilised näitajad. Näoilmeteanalüüsimiseks kasutatakse kahte erinevat strateegiat. Esiteks arvutatakse inimesenäo tähtsamate punktide vahelised erinevad geomeetrilised suhted. Teiseks võetakse emotsionaalse sisuga video kokku vähendatud hulgaks põhikaadriteks, misantakse sisendiks konvolutsioonilisele tehisnärvivõrgule emotsioonide visuaalsekseristamiseks. Kolme klassifitseerija väljunditest (1 akustiline, 2 visuaalset) koostatakse uus kogum tunnuseid, mida kasutatakse õppimiseks süsteemi viimasesetapis. Loodud süsteemi katsetati SAVEE, Poola ja Serbia emotsionaalse kõneandmebaaside, eNTERFACE’05 ja RML andmebaaside peal. Saadud tulemusednäitavad, et võrreldes olemasolevatega võimaldab käesoleva töö raames loodudsüsteem suuremat täpsust emotsioonide äratundmisel. Lisaks anname käesolevastöös ülevaate kirjanduses väljapakutud süsteemidest, millel on võimekus tunda äraemotsiooniga seotud ̆zeste. Selle ülevaate eesmärgiks on hõlbustada uute uurimissuundade leidmist, mis aitaksid lisada töö raames loodud süsteemile ̆zestipõhiseemotsioonituvastuse võimekuse, et veelgi enam tõsta süsteemi emotsioonide äratundmise täpsust.Automatic multimodal emotion recognition is a fundamental subject of interest in affective computing. Its main applications are in human-computer interaction. The systems developed for the foregoing purpose consider combinations of different modalities, based on vocal and visual cues. This thesis takes the foregoing modalities into account, in order to develop an automatic multimodal emotion recognition system. More specifically, it takes advantage of the information extracted from speech and face signals. From speech signals, Mel-frequency cepstral coefficients, filter-bank energies and prosodic features are extracted. Moreover, two different strategies are considered for analyzing the facial data. First, facial landmarks' geometric relations, i.e. distances and angles, are computed. Second, we summarize each emotional video into a reduced set of key-frames. Then they are taught to visually discriminate between the emotions. In order to do so, a convolutional neural network is applied to the key-frames summarizing the videos. Afterward, the output confidence values of all the classifiers from both of the modalities are used to define a new feature space. Lastly, the latter values are learned for the final emotion label prediction, in a late fusion. The experiments are conducted on the SAVEE, Polish, Serbian, eNTERFACE'05 and RML datasets. The results show significant performance improvements by the proposed system in comparison to the existing alternatives, defining the current state-of-the-art on all the datasets. Additionally, we provide a review of emotional body gesture recognition systems proposed in the literature. The aim of the foregoing part is to help figure out possible future research directions for enhancing the performance of the proposed system. More clearly, we imply that incorporating data representing gestures, which constitute another major component of the visual modality, can result in a more efficient framework

    ZATLAB : recognizing gestures for artistic performance interaction

    Get PDF
    Most artistic performances rely on human gestures, ultimately resulting in an elaborate interaction between the performer and the audience. Humans, even without any kind of formal analysis background in music, dance or gesture are typically able to extract, almost unconsciously, a great amount of relevant information from a gesture. In fact, a gesture contains so much information, why not use it to further enhance a performance? Gestures and expressive communication are intrinsically connected, and being intimately attached to our own daily existence, both have a central position in our (nowadays) technological society. However, the use of technology to understand gestures is still somehow vaguely explored, it has moved beyond its first steps but the way towards systems fully capable of analyzing gestures is still long and difficult (Volpe, 2005). Probably because, if on one hand, the recognition of gestures is somehow a trivial task for humans, on the other hand, the endeavor of translating gestures to the virtual world, with a digital encoding is a difficult and illdefined task. It is necessary to somehow bridge this gap, stimulating a constructive interaction between gestures and technology, culture and science, performance and communication. Opening thus, new and unexplored frontiers in the design of a novel generation of multimodal interactive systems. This work proposes an interactive, real time, gesture recognition framework called the Zatlab System (ZtS). This framework is flexible and extensible. Thus, it is in permanent evolution, keeping up with the different technologies and algorithms that emerge at a fast pace nowadays. The basis of the proposed approach is to partition a temporal stream of captured movement into perceptually motivated descriptive features and transmit them for further processing in Machine Learning algorithms. The framework described will take the view that perception primarily depends on the previous knowledge or learning. Just like humans do, the framework will have to learn gestures and their main features so that later it can identify them. It is however planned to be flexible enough to allow learning gestures on the fly. This dissertation also presents a qualitative and quantitative experimental validation of the framework. The qualitative analysis provides the results concerning the users acceptability of the framework. The quantitative validation provides the results about the gesture recognizing algorithms. The use of Machine Learning algorithms in these tasks allows the achievement of final results that compare or outperform typical and state-of-the-art systems. In addition, there are also presented two artistic implementations of the framework, thus assessing its usability amongst the artistic performance domain. Although a specific implementation of the proposed framework is presented in this dissertation and made available as open source software, the proposed approach is flexible enough to be used in other case scenarios, paving the way to applications that can benefit not only the performative arts domain, but also, probably in the near future, helping other types of communication, such as the gestural sign language for the hearing impaired.Grande parte das apresentações artísticas são baseadas em gestos humanos, ultimamente resultando numa intricada interação entre o performer e o público. Os seres humanos, mesmo sem qualquer tipo de formação em música, dança ou gesto são capazes de extrair, quase inconscientemente, uma grande quantidade de informações relevantes a partir de um gesto. Na verdade, um gesto contém imensa informação, porque não usá-la para enriquecer ainda mais uma performance? Os gestos e a comunicação expressiva estão intrinsecamente ligados e estando ambos intimamente ligados à nossa própria existência quotidiana, têm uma posicão central nesta sociedade tecnológica actual. No entanto, o uso da tecnologia para entender o gesto está ainda, de alguma forma, vagamente explorado. Existem já alguns desenvolvimentos, mas o objetivo de sistemas totalmente capazes de analisar os gestos ainda está longe (Volpe, 2005). Provavelmente porque, se por um lado, o reconhecimento de gestos é de certo modo uma tarefa trivial para os seres humanos, por outro lado, o esforço de traduzir os gestos para o mundo virtual, com uma codificação digital é uma tarefa difícil e ainda mal definida. É necessário preencher esta lacuna de alguma forma, estimulando uma interação construtiva entre gestos e tecnologia, cultura e ciência, desempenho e comunicação. Abrindo assim, novas e inexploradas fronteiras na concepção de uma nova geração de sistemas interativos multimodais . Este trabalho propõe uma framework interativa de reconhecimento de gestos, em tempo real, chamada Sistema Zatlab (ZtS). Esta framework é flexível e extensível. Assim, está em permanente evolução, mantendo-se a par das diferentes tecnologias e algoritmos que surgem num ritmo acelerado hoje em dia. A abordagem proposta baseia-se em dividir a sequência temporal do movimento humano nas suas características descritivas e transmiti-las para posterior processamento, em algoritmos de Machine Learning. A framework descrita baseia-se no facto de que a percepção depende, principalmente, do conhecimento ou aprendizagem prévia. Assim, tal como os humanos, a framework terá que aprender os gestos e as suas principais características para que depois possa identificá-los. No entanto, esta está prevista para ser flexível o suficiente de forma a permitir a aprendizagem de gestos de forma dinâmica. Esta dissertação apresenta também uma validação experimental qualitativa e quantitativa da framework. A análise qualitativa fornece os resultados referentes à aceitabilidade da framework. A validação quantitativa fornece os resultados sobre os algoritmos de reconhecimento de gestos. O uso de algoritmos de Machine Learning no reconhecimento de gestos, permite a obtençãoc¸ ˜ao de resultados finais que s˜ao comparaveis ou superam outras implementac¸ ˜oes do mesmo g´enero. Al ´em disso, s˜ao tamb´em apresentadas duas implementac¸ ˜oes art´ısticas da framework, avaliando assim a sua usabilidade no dom´ınio da performance art´ıstica. Apesar duma implementac¸ ˜ao espec´ıfica da framework ser apresentada nesta dissertac¸ ˜ao e disponibilizada como software open-source, a abordagem proposta ´e suficientemente flex´ıvel para que esta seja usada noutros cen´ arios. Abrindo assim, o caminho para aplicac¸ ˜oes que poder˜ao beneficiar n˜ao s´o o dom´ınio das artes performativas, mas tamb´em, provavelmente num futuro pr ´oximo, outros tipos de comunicac¸ ˜ao, como por exemplo, a linguagem gestual usada em casos de deficiˆencia auditiva

    What do Collaborations with the Arts Have to Say About Human-Robot Interaction?

    Get PDF
    This is a collection of papers presented at the workshop What Do Collaborations with the Arts Have to Say About HRI , held at the 2010 Human-Robot Interaction Conference, in Osaka, Japan

    From systematic semiotic modelling to pseudointentional reference

    Get PDF
    Societies as open social systems work through semiotic modelling systems. We view their relevance for shaping primary and secondary needs, as well as metaneeds that are conditioned in social systems. Through conditioning in socialization, semiotic reality can be naturalized up to a level where we can start speaking about not only unconscious, but also unintentional semiosic activity. By that, the very realm of indexicality will be questioned. If indexicality is conjoined with unintended referentiality, then unintentional semiosis means the blurring and fusion of realities far beyond the so-called simulacral semiotic spaces. It is especially acute in the context of the development of technological availabilities where the physical, the semiotic, and the purely virtual reality merge. That quite novel phenomenon is exemplified by semiotic insularization. What follows is that it is hard to define the research object, for the subject is fading away, the real and the virtual are intermingling also in terms of their inhabitants (biological humans, computer users, avatars, virtual identities). Thus the pragmatic dimension of semiotics is gradually becoming lost. Also, the referential reality is moving farther from the informational space created and represented in “traditional” discursive flows, rather becoming based on pseudoreferential clues of meaning making

    Complexity Reduction in Image-Based Breast Cancer Care

    Get PDF
    The diversity of malignancies of the breast requires personalized diagnostic and therapeutic decision making in a complex situation. This thesis contributes in three clinical areas: (1) For clinical diagnostic image evaluation, computer-aided detection and diagnosis of mass and non-mass lesions in breast MRI is developed. 4D texture features characterize mass lesions. For non-mass lesions, a combined detection/characterisation method utilizes the bilateral symmetry of the breast s contrast agent uptake. (2) To improve clinical workflows, a breast MRI reading paradigm is proposed, exemplified by a breast MRI reading workstation prototype. Instead of mouse and keyboard, it is operated using multi-touch gestures. The concept is extended to mammography screening, introducing efficient navigation aids. (3) Contributions to finite element modeling of breast tissue deformations tackle two clinical problems: surgery planning and the prediction of the breast deformation in a MRI biopsy device

    The design and evaluation of an interface and control system for a scariculated rehabilitation robot arm

    Get PDF
    This thesis is concerned with the design and development of a prototype implementation of a Rehabilitation Robotic manipulator based on a novel kinematic configuration. The initial aim of the research was to identify appropriate design criteria for the design of a user interface and control system, and for the subsequent evaluation of the manipulator prototype. This led to a review of the field of rehabilitation robotics, focusing on user evaluations of existing systems. The review showed that the design objectives of individual projects were often contradictory, and that a requirement existed for a more general and complete set of design criteria. These were identified through an analysis of the strengths and weaknesses of existing systems, including an assessment of manipulator performances, commercial success and user feedback. The resulting criteria were used for the design and development of a novel interface and control system for the Middlesex Manipulator - the novel scariculated robotic system. A highly modular architecture was adopted, allowing the manipulator to provide a level of adaptability not approached by existing rehabilitation robotic systems. This allowed the interface to be configured to match the controlling ability and input device selections of individual users. A range of input devices was employed, offering variation in communication mode and bandwidth. These included a commercial voice recognition system, and a novel gesture recognition device. The later was designed using electrolytic tilt sensors, the outputs of which were encoded by artificial neural networks. These allowed for control of the manipulator through head or hand gestures. An individual with spinal-cord injury undertook a single-subject user evaluation of the Middlesex Manipulator over a period of four months. The evaluation provided evidence for the value of adaptability presented by the user interface. It was also shown that the prototype did not currently confonn to all the design criteria, but allowed for the identification of areas for design improvements. This work led to a second research objective, concerned with the problem of configuring an adaptable user interface for a specific individual. A novel form of task analysis is presented within the thesis, that allows the relative usability of interface configurations to be predicted based upon individual user and input device characteristics. An experiment was undertaken with 6 subjects performing 72 tasks runs with 2 interface configurations controlled by user gestures. Task completion times fell within the range predicted, where the range was generated using confidence intervals (α = 0.05) on point estimates of user and device characteristics. This allowed successful prediction over all task runs of the relative task completion times of interface configurations for a given user

    Joint optimization of manifold learning and sparse representations for face and gesture analysis

    Get PDF
    Face and gesture understanding algorithms are powerful enablers in intelligent vision systems for surveillance, security, entertainment, and smart spaces. In the future, complex networks of sensors and cameras may disperse directions to lost tourists, perform directory lookups in the office lobby, or contact the proper authorities in case of an emergency. To be effective, these systems will need to embrace human subtleties while interacting with people in their natural conditions. Computer vision and machine learning techniques have recently become adept at solving face and gesture tasks using posed datasets in controlled conditions. However, spontaneous human behavior under unconstrained conditions, or in the wild, is more complex and is subject to considerable variability from one person to the next. Uncontrolled conditions such as lighting, resolution, noise, occlusions, pose, and temporal variations complicate the matter further. This thesis advances the field of face and gesture analysis by introducing a new machine learning framework based upon dimensionality reduction and sparse representations that is shown to be robust in posed as well as natural conditions. Dimensionality reduction methods take complex objects, such as facial images, and attempt to learn lower dimensional representations embedded in the higher dimensional data. These alternate feature spaces are computationally more efficient and often more discriminative. The performance of various dimensionality reduction methods on geometric and appearance based facial attributes are studied leading to robust facial pose and expression recognition models. The parsimonious nature of sparse representations (SR) has successfully been exploited for the development of highly accurate classifiers for various applications. Despite the successes of SR techniques, large dictionaries and high dimensional data can make these classifiers computationally demanding. Further, sparse classifiers are subject to the adverse effects of a phenomenon known as coefficient contamination, where for example variations in pose may affect identity and expression recognition. This thesis analyzes the interaction between dimensionality reduction and sparse representations to present a unified sparse representation classification framework that addresses both issues of computational complexity and coefficient contamination. Semi-supervised dimensionality reduction is shown to mitigate the coefficient contamination problems associated with SR classifiers. The combination of semi-supervised dimensionality reduction with SR systems forms the cornerstone for a new face and gesture framework called Manifold based Sparse Representations (MSR). MSR is shown to deliver state-of-the-art facial understanding capabilities. To demonstrate the applicability of MSR to new domains, MSR is expanded to include temporal dynamics. The joint optimization of dimensionality reduction and SRs for classification purposes is a relatively new field. The combination of both concepts into a single objective function produce a relation that is neither convex, nor directly solvable. This thesis studies this problem to introduce a new jointly optimized framework. This framework, termed LGE-KSVD, utilizes variants of Linear extension of Graph Embedding (LGE) along with modified K-SVD dictionary learning to jointly learn the dimensionality reduction matrix, sparse representation dictionary, sparse coefficients, and sparsity-based classifier. By injecting LGE concepts directly into the K-SVD learning procedure, this research removes the support constraints K-SVD imparts on dictionary element discovery. Results are shown for facial recognition, facial expression recognition, human activity analysis, and with the addition of a concept called active difference signatures, delivers robust gesture recognition from Kinect or similar depth cameras
    corecore