55 research outputs found
Quel lexique pour un traitement automatique de la référence ?
International audienceLexique et traitement de la référenc
Biomedical applications of belief networks
Biomedicine is an area in which computers have long been expected to play a significant
role. Although many of the early claims have proved unrealistic, computers are gradually
becoming accepted in the biomedical, clinical and research environment. Within these
application areas, expert systems appear to have met with the most resistance, especially
when applied to image interpretation.In order to improve the acceptance of computerised decision support systems it is
necessary to provide the information needed to make rational judgements concerning
the inferences the system has made. This entails an explanation of what inferences
were made, how the inferences were made and how the results of the inference are to
be interpreted. Furthermore there must be a consistent approach to the combining of
information from low level computational processes through to high level expert analyses.nformation from low level computational processes through to high level expert analyses.
Until recently ad hoc formalisms were seen as the only tractable approach to reasoning
under uncertainty. A review of some of these formalisms suggests that they are less
than ideal for the purposes of decision making. Belief networks provide a tractable way
of utilising probability theory as an inference formalism by combining the theoretical
consistency of probability for inference and decision making, with the ability to use the
knowledge of domain experts.nowledge of domain experts.
The potential of belief networks in biomedical applications has already been recogÂŹ
nised and there has been substantial research into the use of belief networks for medical
diagnosis and methods for handling large, interconnected networks. In this thesis the use
of belief networks is extended to include detailed image model matching to show how,
in principle, feature measurement can be undertaken in a fully probabilistic way. The
belief networks employed are usually cyclic and have strong influences between adjacent
nodes, so new techniques for probabilistic updating based on a model of the matching
process have been developed.An object-orientated inference shell called FLAPNet has been implemented and used
to apply the belief network formalism to two application domains. The first application is
model-based matching in fetal ultrasound images. The imaging modality and biological
variation in the subject make model matching a highly uncertain process. A dynamic,
deformable model, similar to active contour models, is used. A belief network combines
constraints derived from local evidence in the image, with global constraints derived from
trained models, to control the iterative refinement of an initial model cue.In the second application a belief network is used for the incremental aggregation of
evidence occurring during the classification of objects on a cervical smear slide as part of
an automated pre-screening system. A belief network provides both an explicit domain
model and a mechanism for the incremental aggregation of evidence, two attributes
important in pre-screening systems.Overall it is argued that belief networks combine the necessary quantitative features
required of a decision support system with desirable qualitative features that will lead
to improved acceptability of expert systems in the biomedical domain
A review of affective computing: From unimodal analysis to multimodal fusion
Affective computing is an emerging interdisciplinary research field bringing together researchers and practitioners from various fields, ranging from artificial intelligence, natural language processing, to cognitive and social sciences. With the proliferation of videos posted online (e.g., on YouTube, Facebook, Twitter) for product reviews, movie reviews, political views, and more, affective computing research has increasingly evolved from conventional unimodal analysis to more complex forms of multimodal analysis. This is the primary motivation behind our first of its kind, comprehensive literature review of the diverse field of affective computing. Furthermore, existing literature surveys lack a detailed discussion of state of the art in multimodal affect analysis frameworks, which this review aims to address. Multimodality is defined by the presence of more than one modality or channel, e.g., visual, audio, text, gestures, and eye gage. In this paper, we focus mainly on the use of audio, visual and text information for multimodal affect analysis, since around 90% of the relevant literature appears to cover these three modalities. Following an overview of different techniques for unimodal affect analysis, we outline existing methods for fusing information from different modalities. As part of this review, we carry out an extensive study of different categories of state-of-the-art fusion techniques, followed by a critical analysis of potential performance improvements with multimodal analysis compared to unimodal analysis. A comprehensive overview of these two complementary fields aims to form the building blocks for readers, to better understand this challenging and exciting research field
Supervision distante pour l'apprentissage de structures discursives dans les conversations multi-locuteurs
L'objectif principal de cette thĂšse est d'amĂ©liorer l'infĂ©rence automatique pour la modĂ©lisation et la comprĂ©hension des communications humaines. En particulier, le but est de faciliter considĂ©rablement l'analyse du discours afin d'implĂ©menter, au niveau industriel, des outils d'aide Ă l'exploration des conversations. Il s'agit notamment de la production de rĂ©sumĂ©s automatiques, de recommandations, de la dĂ©tection des actes de dialogue, de l'identification des dĂ©cisions, de la planification et des relations sĂ©mantiques entre les actes de dialogue afin de comprendre les dialogues. Dans les conversations Ă plusieurs locuteurs, il est important de comprendre non seulement le sens de l'Ă©noncĂ© d'un locuteur et Ă qui il s'adresse, mais aussi les relations sĂ©mantiques qui le lient aux autres Ă©noncĂ©s de la conversation et qui donnent lieu Ă diffĂ©rents fils de discussion. Une rĂ©ponse doit ĂȘtre reconnue comme une rĂ©ponse Ă une question particuliĂšre ; un argument, comme un argument pour ou contre une proposition en cours de discussion ; un dĂ©saccord, comme l'expression d'un point de vue contrastĂ© par rapport Ă une autre idĂ©e dĂ©jĂ exprimĂ©e. Malheureusement, les donnĂ©es de discours annotĂ©es Ă la main et de qualitĂ©s sont coĂ»teuses et prennent du temps, et nous sommes loin d'en avoir assez pour entraĂźner des modĂšles d'apprentissage automatique traditionnels, et encore moins des modĂšles d'apprentissage profond. Il est donc nĂ©cessaire de trouver un moyen plus efficace d'annoter en structures discursives de grands corpus de conversations multi-locuteurs, tels que les transcriptions de rĂ©unions ou les chats. Un autre problĂšme est qu'aucune quantitĂ© de donnĂ©es ne sera suffisante pour permettre aux modĂšles d'apprentissage automatique d'apprendre les caractĂ©ristiques sĂ©mantiques des relations discursives sans l'aide d'un expert ; les donnĂ©es sont tout simplement trop rares. Les relations de longue distance, dans lesquelles un Ă©noncĂ© est sĂ©mantiquement connectĂ© non pas Ă l'Ă©noncĂ© qui le prĂ©cĂšde immĂ©diatement, mais Ă un autre Ă©noncĂ© plus antĂ©rieur/tĂŽt dans la conversation, sont particuliĂšrement difficiles et rares, bien que souvent centrales pour la comprĂ©hension. Notre objectif dans cette thĂšse a donc Ă©tĂ© non seulement de concevoir un modĂšle qui prĂ©dit la structure du discours pour une conversation multipartite sans nĂ©cessiter de grandes quantitĂ©s de donnĂ©es annotĂ©es manuellement, mais aussi de dĂ©velopper une approche qui soit transparente et explicable afin qu'elle puisse ĂȘtre modifiĂ©e et amĂ©liorĂ©e par des experts.The main objective of this thesis is to improve the automatic capture of semantic information with the goal of modeling and understanding human communication. We have advanced the state of the art in discourse parsing, in particular in the retrieval of discourse structure from chat, in order to implement, at the industrial level, tools to help explore conversations. These include the production of automatic summaries, recommendations, dialogue acts detection, identification of decisions, planning and semantic relations between dialogue acts in order to understand dialogues. In multi-party conversations it is important to not only understand the meaning of a participant's utterance and to whom it is addressed, but also the semantic relations that tie it to other utterances in the conversation and give rise to different conversation threads. An answer must be recognized as an answer to a particular question; an argument, as an argument for or against a proposal under discussion; a disagreement, as the expression of a point of view contrasted with another idea already expressed. Unfortunately, capturing such information using traditional supervised machine learning methods from quality hand-annotated discourse data is costly and time-consuming, and we do not have nearly enough data to train these machine learning models, much less deep learning models. Another problem is that arguably, no amount of data will be sufficient for machine learning models to learn the semantic characteristics of discourse relations without some expert guidance; the data are simply too sparse. Long distance relations, in which an utterance is semantically connected not to the immediately preceding utterance, but to another utterance from further back in the conversation, are particularly difficult and rare, though often central to comprehension. It is therefore necessary to find a more efficient way to retrieve discourse structures from large corpora of multi-party conversations, such as meeting transcripts or chats. This is one goal this thesis achieves. In addition, we not only wanted to design a model that predicts discourse structure for multi-party conversation without requiring large amounts of hand-annotated data, but also to develop an approach that is transparent and explainable so that it can be modified and improved by experts. The method detailed in this thesis achieves this goal as well
Automatic recognition of multiparty human interactions using dynamic Bayesian networks
Relating statistical machine learning approaches to the automatic analysis of multiparty
communicative events, such as meetings, is an ambitious research area. We
have investigated automatic meeting segmentation both in terms of âMeeting Actionsâ
and âDialogue Actsâ. Dialogue acts model the discourse structure at a fine
grained level highlighting individual speaker intentions. Group meeting actions describe
the same process at a coarse level, highlighting interactions between different
meeting participants and showing overall group intentions.
A framework based on probabilistic graphical models such as dynamic Bayesian
networks (DBNs) has been investigated for both tasks. Our first set of experiments
is concerned with the segmentation and structuring of meetings (recorded using
multiple cameras and microphones) into sequences of group meeting actions such
as monologue, discussion and presentation. We outline four families of multimodal
features based on speaker turns, lexical transcription, prosody, and visual motion
that are extracted from the raw audio and video recordings. We relate these lowlevel
multimodal features to complex group behaviours proposing a multistreammodelling
framework based on dynamic Bayesian networks. Later experiments are
concerned with the automatic recognition of Dialogue Acts (DAs) in multiparty
conversational speech. We present a joint generative approach based on a switching
DBN for DA recognition in which segmentation and classification of DAs are
carried out in parallel. This approach models a set of features, related to lexical
content and prosody, and incorporates a weighted interpolated factored language
model. In conjunction with this joint generative model, we have also investigated
the use of a discriminative approach, based on conditional random fields, to perform
a reclassification of the segmented DAs.
The DBN based approach yielded significant improvements when applied both
to the meeting action and the dialogue act recognition task. On both tasks, the DBN
framework provided an effective factorisation of the state-space and a flexible infrastructure
able to integrate a heterogeneous set of resources such as continuous
and discrete multimodal features, and statistical language models. Although our
experiments have been principally targeted on multiparty meetings; features, models,
and methodologies developed in this thesis can be employed for a wide range
of applications. Moreover both group meeting actions and DAs offer valuable insights about the current conversational context providing valuable cues and features
for several related research areas such as speaker addressing and focus of attention
modelling, automatic speech recognition and understanding, topic and decision detection
Generating automated meeting summaries
The thesis at hand introduces a novel approach for the generation of abstractive summaries of meetings. While the automatic generation of document summaries has been studied for some decades now, the novelty of this thesis is mainly the application to the meeting domain (instead of text documents) as well as the use of a lexicalized representation formalism on the basis of Frame Semantics. This allows us to generate summaries abstractively (instead of extractively).Die vorliegende Arbeit stellt einen neuartigen Ansatz zur Generierung abstraktiver Zusammenfassungen von Gruppenbesprechungen vor. WĂ€hrend automatische Textzusammenfassungen bereits seit einigen Jahrzehnten erforscht werden, liegt die Neuheit dieser Arbeit vor allem in der AnwendungsdomĂ€ne (Gruppenbesprechungen statt Textdokumenten), sowie der Verwendung eines lexikalisierten ReprĂ€sentationsformulism auf der Basis von Frame-Semantiken, der es erlaubt, Zusammenfassungen abstraktiv (statt extraktiv) zu generieren. Wir argumentieren, dass abstraktive AnsĂ€tze fĂŒr die Zusammenfassung spontansprachlicher Interaktionen besser geeignet sind als extraktive
Multimodal interaction with mobile devices : fusing a broad spectrum of modality combinations
This dissertation presents a multimodal architecture for use in mobile scenarios such as shopping and navigation. It also analyses a wide range of feasible modality input combinations for these contexts. For this purpose, two interlinked demonstrators were designed for stand-alone use on mobile devices. Of particular importance was the design and implementation of a modality fusion module capable of combining input from a range of communication modes like speech, handwriting, and gesture. The implementation is able to account for confidence value biases arising within and between modalities and also provides a method for resolving semantically overlapped input. Tangible interaction with real-world objects and symmetric multimodality are two further themes addressed in this work. The work concludes with the results from two usability field studies that provide insight on user preference and modality intuition for different modality combinations, as well as user acceptance for anthropomorphized objects.Diese Dissertation prĂ€sentiert eine multimodale Architektur zum Gebrauch in mobilen UmstĂ€nden wie z. B. Einkaufen und Navigation. AuĂerdem wird ein groĂes Gebiet von möglichen modalen Eingabekombinationen zu diesen UmstĂ€nden analysiert. Um das in praktischer Weise zu demonstrieren, wurden zwei teilweise gekoppelte VorfĂŒhrungsprogramme zum \u27stand-alone\u27; Gebrauch auf mobilen GerĂ€ten entworfen. Von spezieller Wichtigkeit war der Entwurf und die AusfĂŒhrung eines ModalitĂ€ts-fusion Modul, das die Kombination einer Reihe von Kommunikationsarten wie Sprache, Handschrift und Gesten ermöglicht. Die AusfĂŒhrung erlaubt die VerĂ€nderung von ZuverlĂ€ssigkeitswerten innerhalb einzelner ModalitĂ€ten und auĂerdem ermöglicht eine Methode um die semantisch ĂŒberlappten Eingaben auszuwerten. Wirklichkeitsnaher Dialog mit aktuellen Objekten und symmetrische MultimodalitĂ€t sind zwei weitere Themen die in dieser Arbeit behandelt werden. Die Arbeit schlieĂt mit Resultaten von zwei Feldstudien, die weitere Einsicht erlauben ĂŒber die bevorzugte Art verschiedener ModalitĂ€tskombinationen, sowie auch ĂŒber die Akzeptanz von anthropomorphisierten Objekten
Multimodal interaction with mobile devices : fusing a broad spectrum of modality combinations
This dissertation presents a multimodal architecture for use in mobile scenarios such as shopping and navigation. It also analyses a wide range of feasible modality input combinations for these contexts. For this purpose, two interlinked demonstrators were designed for stand-alone use on mobile devices. Of particular importance was the design and implementation of a modality fusion module capable of combining input from a range of communication modes like speech, handwriting, and gesture. The implementation is able to account for confidence value biases arising within and between modalities and also provides a method for resolving semantically overlapped input. Tangible interaction with real-world objects and symmetric multimodality are two further themes addressed in this work. The work concludes with the results from two usability field studies that provide insight on user preference and modality intuition for different modality combinations, as well as user acceptance for anthropomorphized objects.Diese Dissertation prĂ€sentiert eine multimodale Architektur zum Gebrauch in mobilen UmstĂ€nden wie z. B. Einkaufen und Navigation. AuĂerdem wird ein groĂes Gebiet von möglichen modalen Eingabekombinationen zu diesen UmstĂ€nden analysiert. Um das in praktischer Weise zu demonstrieren, wurden zwei teilweise gekoppelte VorfĂŒhrungsprogramme zum 'stand-alone'; Gebrauch auf mobilen GerĂ€ten entworfen. Von spezieller Wichtigkeit war der Entwurf und die AusfĂŒhrung eines ModalitĂ€ts-fusion Modul, das die Kombination einer Reihe von Kommunikationsarten wie Sprache, Handschrift und Gesten ermöglicht. Die AusfĂŒhrung erlaubt die VerĂ€nderung von ZuverlĂ€ssigkeitswerten innerhalb einzelner ModalitĂ€ten und auĂerdem ermöglicht eine Methode um die semantisch ĂŒberlappten Eingaben auszuwerten. Wirklichkeitsnaher Dialog mit aktuellen Objekten und symmetrische MultimodalitĂ€t sind zwei weitere Themen die in dieser Arbeit behandelt werden. Die Arbeit schlieĂt mit Resultaten von zwei Feldstudien, die weitere Einsicht erlauben ĂŒber die bevorzugte Art verschiedener ModalitĂ€tskombinationen, sowie auch ĂŒber die Akzeptanz von anthropomorphisierten Objekten
Designing Embodied Interactive Software Agents for E-Learning: Principles, Components, and Roles
Embodied interactive software agents are complex autonomous, adaptive, and social software systems with a digital embodiment that enables them to act on and react to other entities (users, objects, and other agents) in their environment through bodily actions, which include the use of verbal and non-verbal communicative behaviors in face-to-face interactions with the user. These agents have been developed for various roles in different application domains, in which they perform tasks that have been assigned to them by their developers or delegated to them by their users or by other agents. In computer-assisted learning, embodied interactive pedagogical software agents have the general task to promote human learning by working with students (and other agents) in computer-based learning environments, among them e-learning platforms based on Internet technologies, such as the Virtual Linguistics Campus (www.linguistics-online.com). In these environments, pedagogical agents provide contextualized, qualified, personalized, and timely assistance, cooperation, instruction, motivation, and services for both individual learners and groups of learners.
This thesis develops a comprehensive, multidisciplinary, and user-oriented view of the design of embodied interactive pedagogical software agents, which integrates theoretical and practical insights from various academic and other fields. The research intends to contribute to the scientific understanding of issues, methods, theories, and technologies that are involved in the design, implementation, and evaluation of embodied interactive software agents for different roles in e-learning and other areas. For developers, the thesis provides sixteen basic principles (Added Value, Perceptible Qualities, Balanced Design, Coherence, Consistency, Completeness, Comprehensibility, Individuality, Variability, Communicative Ability, Modularity, Teamwork, Participatory Design, Role Awareness, Cultural Awareness, and Relationship Building) plus a large number of specific guidelines for the design of embodied interactive software agents and their components. Furthermore, it offers critical reviews of theories, concepts, approaches, and technologies from different areas and disciplines that are relevant to agent design. Finally, it discusses three pedagogical agent roles (virtual native speaker, coach, and peer) in the scenario of the linguistic fieldwork classes on the Virtual Linguistics Campus and presents detailed considerations for the design of an agent for one of these roles (the virtual native speaker)
- âŠ