55 research outputs found

    Quel lexique pour un traitement automatique de la référence ?

    Get PDF
    International audienceLexique et traitement de la référenc

    Biomedical applications of belief networks

    Get PDF
    Biomedicine is an area in which computers have long been expected to play a significant role. Although many of the early claims have proved unrealistic, computers are gradually becoming accepted in the biomedical, clinical and research environment. Within these application areas, expert systems appear to have met with the most resistance, especially when applied to image interpretation.In order to improve the acceptance of computerised decision support systems it is necessary to provide the information needed to make rational judgements concerning the inferences the system has made. This entails an explanation of what inferences were made, how the inferences were made and how the results of the inference are to be interpreted. Furthermore there must be a consistent approach to the combining of information from low level computational processes through to high level expert analyses.nformation from low level computational processes through to high level expert analyses. Until recently ad hoc formalisms were seen as the only tractable approach to reasoning under uncertainty. A review of some of these formalisms suggests that they are less than ideal for the purposes of decision making. Belief networks provide a tractable way of utilising probability theory as an inference formalism by combining the theoretical consistency of probability for inference and decision making, with the ability to use the knowledge of domain experts.nowledge of domain experts. The potential of belief networks in biomedical applications has already been recogÂŹ nised and there has been substantial research into the use of belief networks for medical diagnosis and methods for handling large, interconnected networks. In this thesis the use of belief networks is extended to include detailed image model matching to show how, in principle, feature measurement can be undertaken in a fully probabilistic way. The belief networks employed are usually cyclic and have strong influences between adjacent nodes, so new techniques for probabilistic updating based on a model of the matching process have been developed.An object-orientated inference shell called FLAPNet has been implemented and used to apply the belief network formalism to two application domains. The first application is model-based matching in fetal ultrasound images. The imaging modality and biological variation in the subject make model matching a highly uncertain process. A dynamic, deformable model, similar to active contour models, is used. A belief network combines constraints derived from local evidence in the image, with global constraints derived from trained models, to control the iterative refinement of an initial model cue.In the second application a belief network is used for the incremental aggregation of evidence occurring during the classification of objects on a cervical smear slide as part of an automated pre-screening system. A belief network provides both an explicit domain model and a mechanism for the incremental aggregation of evidence, two attributes important in pre-screening systems.Overall it is argued that belief networks combine the necessary quantitative features required of a decision support system with desirable qualitative features that will lead to improved acceptability of expert systems in the biomedical domain

    A review of affective computing: From unimodal analysis to multimodal fusion

    Get PDF
    Affective computing is an emerging interdisciplinary research field bringing together researchers and practitioners from various fields, ranging from artificial intelligence, natural language processing, to cognitive and social sciences. With the proliferation of videos posted online (e.g., on YouTube, Facebook, Twitter) for product reviews, movie reviews, political views, and more, affective computing research has increasingly evolved from conventional unimodal analysis to more complex forms of multimodal analysis. This is the primary motivation behind our first of its kind, comprehensive literature review of the diverse field of affective computing. Furthermore, existing literature surveys lack a detailed discussion of state of the art in multimodal affect analysis frameworks, which this review aims to address. Multimodality is defined by the presence of more than one modality or channel, e.g., visual, audio, text, gestures, and eye gage. In this paper, we focus mainly on the use of audio, visual and text information for multimodal affect analysis, since around 90% of the relevant literature appears to cover these three modalities. Following an overview of different techniques for unimodal affect analysis, we outline existing methods for fusing information from different modalities. As part of this review, we carry out an extensive study of different categories of state-of-the-art fusion techniques, followed by a critical analysis of potential performance improvements with multimodal analysis compared to unimodal analysis. A comprehensive overview of these two complementary fields aims to form the building blocks for readers, to better understand this challenging and exciting research field

    Supervision distante pour l'apprentissage de structures discursives dans les conversations multi-locuteurs

    Get PDF
    L'objectif principal de cette thĂšse est d'amĂ©liorer l'infĂ©rence automatique pour la modĂ©lisation et la comprĂ©hension des communications humaines. En particulier, le but est de faciliter considĂ©rablement l'analyse du discours afin d'implĂ©menter, au niveau industriel, des outils d'aide Ă  l'exploration des conversations. Il s'agit notamment de la production de rĂ©sumĂ©s automatiques, de recommandations, de la dĂ©tection des actes de dialogue, de l'identification des dĂ©cisions, de la planification et des relations sĂ©mantiques entre les actes de dialogue afin de comprendre les dialogues. Dans les conversations Ă  plusieurs locuteurs, il est important de comprendre non seulement le sens de l'Ă©noncĂ© d'un locuteur et Ă  qui il s'adresse, mais aussi les relations sĂ©mantiques qui le lient aux autres Ă©noncĂ©s de la conversation et qui donnent lieu Ă  diffĂ©rents fils de discussion. Une rĂ©ponse doit ĂȘtre reconnue comme une rĂ©ponse Ă  une question particuliĂšre ; un argument, comme un argument pour ou contre une proposition en cours de discussion ; un dĂ©saccord, comme l'expression d'un point de vue contrastĂ© par rapport Ă  une autre idĂ©e dĂ©jĂ  exprimĂ©e. Malheureusement, les donnĂ©es de discours annotĂ©es Ă  la main et de qualitĂ©s sont coĂ»teuses et prennent du temps, et nous sommes loin d'en avoir assez pour entraĂźner des modĂšles d'apprentissage automatique traditionnels, et encore moins des modĂšles d'apprentissage profond. Il est donc nĂ©cessaire de trouver un moyen plus efficace d'annoter en structures discursives de grands corpus de conversations multi-locuteurs, tels que les transcriptions de rĂ©unions ou les chats. Un autre problĂšme est qu'aucune quantitĂ© de donnĂ©es ne sera suffisante pour permettre aux modĂšles d'apprentissage automatique d'apprendre les caractĂ©ristiques sĂ©mantiques des relations discursives sans l'aide d'un expert ; les donnĂ©es sont tout simplement trop rares. Les relations de longue distance, dans lesquelles un Ă©noncĂ© est sĂ©mantiquement connectĂ© non pas Ă  l'Ă©noncĂ© qui le prĂ©cĂšde immĂ©diatement, mais Ă  un autre Ă©noncĂ© plus antĂ©rieur/tĂŽt dans la conversation, sont particuliĂšrement difficiles et rares, bien que souvent centrales pour la comprĂ©hension. Notre objectif dans cette thĂšse a donc Ă©tĂ© non seulement de concevoir un modĂšle qui prĂ©dit la structure du discours pour une conversation multipartite sans nĂ©cessiter de grandes quantitĂ©s de donnĂ©es annotĂ©es manuellement, mais aussi de dĂ©velopper une approche qui soit transparente et explicable afin qu'elle puisse ĂȘtre modifiĂ©e et amĂ©liorĂ©e par des experts.The main objective of this thesis is to improve the automatic capture of semantic information with the goal of modeling and understanding human communication. We have advanced the state of the art in discourse parsing, in particular in the retrieval of discourse structure from chat, in order to implement, at the industrial level, tools to help explore conversations. These include the production of automatic summaries, recommendations, dialogue acts detection, identification of decisions, planning and semantic relations between dialogue acts in order to understand dialogues. In multi-party conversations it is important to not only understand the meaning of a participant's utterance and to whom it is addressed, but also the semantic relations that tie it to other utterances in the conversation and give rise to different conversation threads. An answer must be recognized as an answer to a particular question; an argument, as an argument for or against a proposal under discussion; a disagreement, as the expression of a point of view contrasted with another idea already expressed. Unfortunately, capturing such information using traditional supervised machine learning methods from quality hand-annotated discourse data is costly and time-consuming, and we do not have nearly enough data to train these machine learning models, much less deep learning models. Another problem is that arguably, no amount of data will be sufficient for machine learning models to learn the semantic characteristics of discourse relations without some expert guidance; the data are simply too sparse. Long distance relations, in which an utterance is semantically connected not to the immediately preceding utterance, but to another utterance from further back in the conversation, are particularly difficult and rare, though often central to comprehension. It is therefore necessary to find a more efficient way to retrieve discourse structures from large corpora of multi-party conversations, such as meeting transcripts or chats. This is one goal this thesis achieves. In addition, we not only wanted to design a model that predicts discourse structure for multi-party conversation without requiring large amounts of hand-annotated data, but also to develop an approach that is transparent and explainable so that it can be modified and improved by experts. The method detailed in this thesis achieves this goal as well

    Automatic recognition of multiparty human interactions using dynamic Bayesian networks

    Get PDF
    Relating statistical machine learning approaches to the automatic analysis of multiparty communicative events, such as meetings, is an ambitious research area. We have investigated automatic meeting segmentation both in terms of “Meeting Actions” and “Dialogue Acts”. Dialogue acts model the discourse structure at a fine grained level highlighting individual speaker intentions. Group meeting actions describe the same process at a coarse level, highlighting interactions between different meeting participants and showing overall group intentions. A framework based on probabilistic graphical models such as dynamic Bayesian networks (DBNs) has been investigated for both tasks. Our first set of experiments is concerned with the segmentation and structuring of meetings (recorded using multiple cameras and microphones) into sequences of group meeting actions such as monologue, discussion and presentation. We outline four families of multimodal features based on speaker turns, lexical transcription, prosody, and visual motion that are extracted from the raw audio and video recordings. We relate these lowlevel multimodal features to complex group behaviours proposing a multistreammodelling framework based on dynamic Bayesian networks. Later experiments are concerned with the automatic recognition of Dialogue Acts (DAs) in multiparty conversational speech. We present a joint generative approach based on a switching DBN for DA recognition in which segmentation and classification of DAs are carried out in parallel. This approach models a set of features, related to lexical content and prosody, and incorporates a weighted interpolated factored language model. In conjunction with this joint generative model, we have also investigated the use of a discriminative approach, based on conditional random fields, to perform a reclassification of the segmented DAs. The DBN based approach yielded significant improvements when applied both to the meeting action and the dialogue act recognition task. On both tasks, the DBN framework provided an effective factorisation of the state-space and a flexible infrastructure able to integrate a heterogeneous set of resources such as continuous and discrete multimodal features, and statistical language models. Although our experiments have been principally targeted on multiparty meetings; features, models, and methodologies developed in this thesis can be employed for a wide range of applications. Moreover both group meeting actions and DAs offer valuable insights about the current conversational context providing valuable cues and features for several related research areas such as speaker addressing and focus of attention modelling, automatic speech recognition and understanding, topic and decision detection

    Generating automated meeting summaries

    Get PDF
    The thesis at hand introduces a novel approach for the generation of abstractive summaries of meetings. While the automatic generation of document summaries has been studied for some decades now, the novelty of this thesis is mainly the application to the meeting domain (instead of text documents) as well as the use of a lexicalized representation formalism on the basis of Frame Semantics. This allows us to generate summaries abstractively (instead of extractively).Die vorliegende Arbeit stellt einen neuartigen Ansatz zur Generierung abstraktiver Zusammenfassungen von Gruppenbesprechungen vor. WĂ€hrend automatische Textzusammenfassungen bereits seit einigen Jahrzehnten erforscht werden, liegt die Neuheit dieser Arbeit vor allem in der AnwendungsdomĂ€ne (Gruppenbesprechungen statt Textdokumenten), sowie der Verwendung eines lexikalisierten ReprĂ€sentationsformulism auf der Basis von Frame-Semantiken, der es erlaubt, Zusammenfassungen abstraktiv (statt extraktiv) zu generieren. Wir argumentieren, dass abstraktive AnsĂ€tze fĂŒr die Zusammenfassung spontansprachlicher Interaktionen besser geeignet sind als extraktive

    Multimodal interaction with mobile devices : fusing a broad spectrum of modality combinations

    Get PDF
    This dissertation presents a multimodal architecture for use in mobile scenarios such as shopping and navigation. It also analyses a wide range of feasible modality input combinations for these contexts. For this purpose, two interlinked demonstrators were designed for stand-alone use on mobile devices. Of particular importance was the design and implementation of a modality fusion module capable of combining input from a range of communication modes like speech, handwriting, and gesture. The implementation is able to account for confidence value biases arising within and between modalities and also provides a method for resolving semantically overlapped input. Tangible interaction with real-world objects and symmetric multimodality are two further themes addressed in this work. The work concludes with the results from two usability field studies that provide insight on user preference and modality intuition for different modality combinations, as well as user acceptance for anthropomorphized objects.Diese Dissertation prĂ€sentiert eine multimodale Architektur zum Gebrauch in mobilen UmstĂ€nden wie z. B. Einkaufen und Navigation. Außerdem wird ein großes Gebiet von möglichen modalen Eingabekombinationen zu diesen UmstĂ€nden analysiert. Um das in praktischer Weise zu demonstrieren, wurden zwei teilweise gekoppelte VorfĂŒhrungsprogramme zum \u27stand-alone\u27; Gebrauch auf mobilen GerĂ€ten entworfen. Von spezieller Wichtigkeit war der Entwurf und die AusfĂŒhrung eines ModalitĂ€ts-fusion Modul, das die Kombination einer Reihe von Kommunikationsarten wie Sprache, Handschrift und Gesten ermöglicht. Die AusfĂŒhrung erlaubt die VerĂ€nderung von ZuverlĂ€ssigkeitswerten innerhalb einzelner ModalitĂ€ten und außerdem ermöglicht eine Methode um die semantisch ĂŒberlappten Eingaben auszuwerten. Wirklichkeitsnaher Dialog mit aktuellen Objekten und symmetrische MultimodalitĂ€t sind zwei weitere Themen die in dieser Arbeit behandelt werden. Die Arbeit schließt mit Resultaten von zwei Feldstudien, die weitere Einsicht erlauben ĂŒber die bevorzugte Art verschiedener ModalitĂ€tskombinationen, sowie auch ĂŒber die Akzeptanz von anthropomorphisierten Objekten

    Multimodal interaction with mobile devices : fusing a broad spectrum of modality combinations

    Get PDF
    This dissertation presents a multimodal architecture for use in mobile scenarios such as shopping and navigation. It also analyses a wide range of feasible modality input combinations for these contexts. For this purpose, two interlinked demonstrators were designed for stand-alone use on mobile devices. Of particular importance was the design and implementation of a modality fusion module capable of combining input from a range of communication modes like speech, handwriting, and gesture. The implementation is able to account for confidence value biases arising within and between modalities and also provides a method for resolving semantically overlapped input. Tangible interaction with real-world objects and symmetric multimodality are two further themes addressed in this work. The work concludes with the results from two usability field studies that provide insight on user preference and modality intuition for different modality combinations, as well as user acceptance for anthropomorphized objects.Diese Dissertation prĂ€sentiert eine multimodale Architektur zum Gebrauch in mobilen UmstĂ€nden wie z. B. Einkaufen und Navigation. Außerdem wird ein großes Gebiet von möglichen modalen Eingabekombinationen zu diesen UmstĂ€nden analysiert. Um das in praktischer Weise zu demonstrieren, wurden zwei teilweise gekoppelte VorfĂŒhrungsprogramme zum 'stand-alone'; Gebrauch auf mobilen GerĂ€ten entworfen. Von spezieller Wichtigkeit war der Entwurf und die AusfĂŒhrung eines ModalitĂ€ts-fusion Modul, das die Kombination einer Reihe von Kommunikationsarten wie Sprache, Handschrift und Gesten ermöglicht. Die AusfĂŒhrung erlaubt die VerĂ€nderung von ZuverlĂ€ssigkeitswerten innerhalb einzelner ModalitĂ€ten und außerdem ermöglicht eine Methode um die semantisch ĂŒberlappten Eingaben auszuwerten. Wirklichkeitsnaher Dialog mit aktuellen Objekten und symmetrische MultimodalitĂ€t sind zwei weitere Themen die in dieser Arbeit behandelt werden. Die Arbeit schließt mit Resultaten von zwei Feldstudien, die weitere Einsicht erlauben ĂŒber die bevorzugte Art verschiedener ModalitĂ€tskombinationen, sowie auch ĂŒber die Akzeptanz von anthropomorphisierten Objekten

    Designing Embodied Interactive Software Agents for E-Learning: Principles, Components, and Roles

    Get PDF
    Embodied interactive software agents are complex autonomous, adaptive, and social software systems with a digital embodiment that enables them to act on and react to other entities (users, objects, and other agents) in their environment through bodily actions, which include the use of verbal and non-verbal communicative behaviors in face-to-face interactions with the user. These agents have been developed for various roles in different application domains, in which they perform tasks that have been assigned to them by their developers or delegated to them by their users or by other agents. In computer-assisted learning, embodied interactive pedagogical software agents have the general task to promote human learning by working with students (and other agents) in computer-based learning environments, among them e-learning platforms based on Internet technologies, such as the Virtual Linguistics Campus (www.linguistics-online.com). In these environments, pedagogical agents provide contextualized, qualified, personalized, and timely assistance, cooperation, instruction, motivation, and services for both individual learners and groups of learners. This thesis develops a comprehensive, multidisciplinary, and user-oriented view of the design of embodied interactive pedagogical software agents, which integrates theoretical and practical insights from various academic and other fields. The research intends to contribute to the scientific understanding of issues, methods, theories, and technologies that are involved in the design, implementation, and evaluation of embodied interactive software agents for different roles in e-learning and other areas. For developers, the thesis provides sixteen basic principles (Added Value, Perceptible Qualities, Balanced Design, Coherence, Consistency, Completeness, Comprehensibility, Individuality, Variability, Communicative Ability, Modularity, Teamwork, Participatory Design, Role Awareness, Cultural Awareness, and Relationship Building) plus a large number of specific guidelines for the design of embodied interactive software agents and their components. Furthermore, it offers critical reviews of theories, concepts, approaches, and technologies from different areas and disciplines that are relevant to agent design. Finally, it discusses three pedagogical agent roles (virtual native speaker, coach, and peer) in the scenario of the linguistic fieldwork classes on the Virtual Linguistics Campus and presents detailed considerations for the design of an agent for one of these roles (the virtual native speaker)
    • 

    corecore