72 research outputs found
New methods, techniques and applications for sketch recognition
2012-2013The use of diagrams is common in various disciplines. Typical examples
include maps, line graphs, bar charts, engineering blueprints, architects’
sketches, hand drawn schematics, etc.. In general, diagrams can be created
either by using pen and paper, or by using specific computer programs. These
programs provide functions to facilitate the creation of the diagram, such as
copy-and-paste, but the classic WIMP interfaces they use are unnatural when
compared to pen and paper. Indeed, it is not rare that a designer prefers
to use pen and paper at the beginning of the design, and then transfer the
diagram to the computer later.
To avoid this double step, a solution is to allow users to sketch directly on
the computer. This requires both specific hardware and sketch recognition
based software. As regards hardware, many pen/touch based devices such as
tablets, smartphones, interactive boards and tables, etc. are available today,
also at reasonable costs. Sketch recognition is needed when the sketch must
be processed and not considered as a simple image and it is crucial to the
success of this new modality of interaction. It is a difficult problem due to the
inherent imprecision and ambiguity of a freehand drawing and to the many
domains of applications. The aim of this thesis is to propose new methods
and applications regarding the sketch recognition. The presentation of the
results is divided into several contributions, facing problems such as corner
detection, sketched symbol recognition and autocompletion, graphical context
detection, sketched Euler diagram interpretation.
The first contribution regards the problem of detecting the corners present
in a stroke. Corner detection is often performed during preprocessing to
segment a stroke in single simple geometric primitives such as lines or curves.
The corner recognizer proposed in this thesis, RankFrag, is inspired by the
method proposed by Ouyang and Davis in 2011 and improves the accuracy
percentages compared to other methods recently proposed in the literature.
The second contribution is a new method to recognize multi-stroke hand
drawn symbols, which is invariant with respect to scaling and supports symbol
recognition independently from the number and order of strokes. The method
is an adaptation of the algorithm proposed by Belongie et al. in 2002 to the
case of sketched images. This is achieved by using stroke related information.
The method has been evaluated on a set of more than 100 symbols from
the Military Course of Action domain and the results show that the new
recognizer outperforms the original one.
The third contribution is a new method for recognizing multi-stroke partially
hand drawn symbols which is invariant with respect to scale, and
supports symbol recognition independently from the number and order of
strokes. The recognition technique is based on subgraph isomorphism and
exploits a novel spatial descriptor, based on polar histograms, to represent
relations between two stroke primitives. The tests show that the approach
gives a satisfactory recognition rate with partially drawn symbols, also with
a very low level of drawing completion, and outperforms the existing approaches
proposed in the literature. Furthermore, as an application, a system
presenting a user interface to draw symbols and implementing the proposed
autocompletion approach has been developed. Moreover a user study aimed
at evaluating the human performance in hand drawn symbol autocompletion
has been presented. Using the set of symbols from the Military Course of
Action domain, the user study evaluates the conditions under which the
users are willing to exploit the autocompletion functionality and those under
which they can use it efficiently. The results show that the autocompletion
functionality can be used in a profitable way, with a drawing time saving of
about 18%.
The fourth contribution regards the detection of the graphical context of
hand drawn symbols, and in particular, the development of an approach for
identifying attachment areas on sketched symbols. In the field of syntactic
recognition of hand drawn visual languages, the recognition of the relations
among graphical symbols is one of the first important tasks to be accomplished
and is usually reduced to recognize the attachment areas of each symbol and
the relations among them. The approach is independent from the method used
to recognize symbols and assumes that the symbol has already been recognized.
The approach is evaluated through a user study aimed at comparing the
attachment areas detected by the system to those devised by the users. The
results show that the system can identify attachment areas with a reasonable
accuracy.
The last contribution is EulerSketch, an interactive system for the sketching
and interpretation of Euler diagrams (EDs). The interpretation of a hand
drawn ED produces two types of text encodings of the ED topology called
static code and ordered Gauss paragraph (OGP) code, and a further encoding
of its regions. Given the topology of an ED expressed through static or OGP
code, EulerSketch automatically generates a new topologically equivalent ED
in its graphical representation. [edited by author]XII n.s
非英語母語話者のためのインタラクティブな書き換え
Tohoku University博士(情報科学)thesi
Deep interactive text prediction and quality estimation in translation interfaces
The output of automatic translation systems is usually destined for human consumption. In most cases, translators use machine translation (MT) as the first step in the process of creating a fluent translation in a target language given a text in a source language. However, there are many possible ways for translators to interact with MT. The goal of this thesis is to investigate new interactive designs and interfaces for translation.
In the first part of the thesis, we present pilot studies which investigate aspects of the interactive translation process, building upon insights from Human-Computer Interaction (HCI) and Translation Studies. We developed HandyCAT, an open-source platform for translation process research, which was used to conduct two user studies: an investigation into interactive machine translation and evaluation of a novel component for post-editing.
We then propose new models for quality estimation (QE) of MT, and new models for es- timating the confidence of prefix-based neural interactive MT (IMT) systems. We present a series of experiments using neural sequence models for QE and IMT. We focus upon token-level QE models, which can be used as standalone components or integrated into post-editing pipelines, guiding users in selecting phrases to edit. We introduce a strong recurrent baseline for neural QE, and show how state of the art automatic post-editing (APE) models can be re-purposed for word-level QE. We also propose an auxiliary con- fidence model, which can be attached to (I)-MT systems to use the model’s internal state to estimate confidence about the model’s predictions.
The third part of the thesis introduces lexically constrained decoding using grid beam search (GBS), a means of expanding prefix-based interactive translation to general lexical constraints. By integrating lexically constrained decoding with word-level QE, we then suggest a novel interactive design for translation interfaces, and test our hypotheses using simulated editing. The final section focuses upon designing an interface for interactive post-editing, incorporating both GBS and QE. We design components which introduce a new way of interacting with translation models, and test these components in a user-study
Progress report on user interface studies, cognitive and user modelling
This WP presents the empirical foundations for the development of the CasMaCat workbench.
A series of experiments are being run to establish basic facts about translator behaviour in
computer-aided translation, focusing on the use of visualization options and input modalities
while post-editing machine translation (sections 1 and 2). Another series of studies deals with
cognitive modelling and individual di erences in translation production, in particular translator
types and translation/post-editing styles (sections 3 and 4).
This deliverable, D1.2, is a progress report on user interface studies, cognitive and user
modelling. It reports on post-editing and interactive translation experiments, as well as cognitive
modelling covering Tasks 1.1, 1.2, 1.3 and 1.5. It also addresses the issues that were raised in
the last review report for the project period M1 to M12, in particular:
the basic facts about the translator behaviour in CAT (sections 1 and 4) highlighting
usage of visualization and input modalities (see also D5.3).
the individual di erences in translator types and translation styles, (section 3, see also
terminology, section A.1)
the results and conclusions of preliminary studies conducted to investigate post-editing
and translation styles (section 2 and 5)
From the experiments and analyses so far, it is clear that the data collected in the CRITT
TPR-DB (Translation Process Research database) is an essential resource to achieve the Cas-
MaCat project goals. It allows for large-scale in depth studies of human translation processes
and thus serves as a basis of information to empirically grounded future development of the
CasMaCat workbench. It attracts an international research community to investigate human
translation processes under various conditions and to arrive at a more advanced level of understanding.
Additional language pairs and more data increase the chances to better underpin the
conclusions needed, as will be shown in this report, and as concluded in section 5
HairBrush for Immersive Data-Driven Hair Modeling
International audienceWhile hair is an essential component of virtual humans, it is also one of the most challenging digital assets to create. Existing automatic techniques lack the generality and flexibility to create rich hair variations, while manual authoring interfaces often require considerable artistic skills and efforts, especially for intricate 3D hair structures that can be difficult to navigate. We propose an interactive hair modeling system that can help create complex hairstyles in minutes or hours that would otherwise take much longer with existing tools. Modelers, including novice users, can focus on the overall hairstyles and local hair deformations, as our system intelligently suggests the desired hair parts. Our method combines the flexibility of manual authoring and the convenience of data-driven automation. Since hair contains intricate 3D structures such as buns, knots, and strands, they are inherently challenging to create using traditional 2D interfaces. Our system provides a new 3D hair author-ing interface for immersive interaction in virtual reality (VR). Users can draw high-level guide strips, from which our system predicts the most plausible hairstyles via a deep neural network trained from a professionally curated dataset. Each hairstyle in our dataset is composed of multiple variations, serving as blend-shapes to fit the user drawings via global blending and local deformation. The fitted hair models are visualized as interactive suggestions that the user can select, modify, or ignore. We conducted a user study to confirm that our system can significantly reduce manual labor while improve the output quality for modeling a variety of head and facial hairstyles that are challenging to create via existing techniques
Improving the translation environment for professional translators
When using computer-aided translation systems in a typical, professional translation workflow, there are several stages at which there is room for improvement. The SCATE (Smart Computer-Aided Translation Environment) project investigated several of these aspects, both from a human-computer interaction point of view, as well as from a purely technological side.
This paper describes the SCATE research with respect to improved fuzzy matching, parallel treebanks, the integration of translation memories with machine translation, quality estimation, terminology extraction from comparable texts, the use of speech recognition in the translation process, and human computer interaction and interface design for the professional translation environment. For each of these topics, we describe the experiments we performed and the conclusions drawn, providing an overview of the highlights of the entire SCATE project
Contextual cues for deep learning models of code
Le code source offre un domaine d'application passionnant des méthodes d'apprentissage en profondeur, englobant des tâches telles que la synthèse, la réparation et l'analyse de programmes, ainsi que des tâches à l'intersection du code et du langage naturel. Bien que les modèles d’apprentissage profond pour le code, en particulier les grands modèles de langage, aient récemment connu un succès significatif, ils peuvent avoir du mal à se généraliser à du code invisible. Cela peut conduire à des inexactitudes, en particulier lorsque vous travaillez avec des référentiels contenant des logiciels propriétaires ou du code en cours de travail.
L'objectif principal de cette thèse est d'exploiter efficacement les signaux utiles du contexte disponible afin d'améliorer les performances des modèles de code d'apprentissage profond pour une tâche donnée. En incorporant ces indices contextuels, les capacités de généralisation du modèle sont amplifiées, fournissant des informations supplémentaires non évidentes à partir de l'entrée d'origine et orientant son attention vers des détails essentiels. De plus, l'utilisation d'indices contextuels facilite l'adaptation aux nouvelles tâches et améliore les performances des tâches existantes en effectuant des prédictions plus contextuelles. Pour y parvenir, nous présentons un cadre général comprenant deux étapes : (a) l'amélioration du contexte, qui implique l'enrichissement de l'entrée avec un contexte de support obtenu grâce à l'identification et à la sélection d'indices contextuels pertinents, et (b) la prédiction à l'aide du contexte amélioré, où nous exploitez le contexte de support combiné aux entrées pour faire des prédictions précises. La thèse présente quatre articles qui proposent diverses approches pour ces étapes.
Le premier article divise le problème standard de la programmation par exemples en deux étapes: (a) trouver des programmes qui satisfont des exemples individuels (solutions par exemple) et, (b) combiner ces solutions par exemple en tirant parti de leurs états d'exécution de programme pour trouver un programme qui satisfait tous les exemples donnés.
Le deuxième article propose une approche pour sélectionner des informations ciblées à partir du fichier actuel et les utiliser pour adapter le modèle de complétion de code à un contexte local jamais vu précédemment.
Le troisième article s'appuie sur le deuxième article en tirant parti des indices contextuels de l'ensemble du répertoire de code à l'aide d'un ensemble de requêtes ({\it prompts}) proposées suggérant l'emplacement et le contenu du contexte particulièrement utile à extraire du répertoire. Nous proposons un cadre pour sélectionner la requête la plus pertinente, qui est ensuite utilisée pour demander à un modèle de langage de code de générer des prédictions pour le reste de la ligne de code suivant un curseur positionné dans un fichier.
Le quatrième article prolonge le troisième article en proposant un cadre qui apprend à combiner plusieurs contextes divers à partir du répertoire. Nous montrons que la formation de modèles de language de code plus petits de cette manière fonctionne mieux ou à égalité avec des modèles beaucoup plus grands qui n'utilisent pas le contexte du répertoire de code.Source code provides an exciting application area of deep learning methods, encompassing tasks like program synthesis, repair, and analysis, as well as tasks at the intersection of code and natural language. Although deep learning models for code, particularly large language models, have recently seen significant success, they can face challenges in generalizing to unseen code. This can lead to inaccuracies especially when working with repositories that contain proprietary software or work-in-progress code.
The main focus of this thesis is to effectively harness useful signals from the available context such that it can improve the performance of the deep learning models of code at the given task. By incorporating these contextual cues, the model's generalization capabilities are amplified, providing additional insights not evident from the original input and directing its focus toward essential details. Furthermore, the use of contextual cues aids in adapting to new tasks and boosts performance on existing ones by making more context-aware predictions. To achieve this, we present a general framework comprising two stages: (a) Context Enhancement, which involves enriching the input with support context obtained through the identification and selection of relevant contextual cues, and (b) Prediction using the Enhanced Context, where we leverage the support context combined with the input to make accurate predictions. The thesis presents four articles that propose diverse approaches for these stages.
The first article breaks the standard problem of programming by examples into two stages: (a) finding programs that satisfy individual examples (per-example solutions) and, (b) combining these per-example solutions by leveraging their program execution states to find a program that satisfies all given examples.
The second article proposes an approach for selecting targeted information from the current file and using it to adapt the code completion model to an unseen, local context.
The third article builds upon the second article by leveraging contextual cues from the entire code repository using a set of prompt proposals that govern the location and content of the context that should be taken from the repository. We propose a framework to select the most relevant prompt proposal context which is then used to prompt a large language model of code to generate predictions for the tokens in the rest of the line following the cursor in a file.
The fourth article extends the third article by proposing a framework that learns to combine multiple diverse contexts from the repository. We show that training smaller models of code this way performs better or at par with significantly larger models that are not trained with repository context
Designing for Nurse-AI Collaboration in Triage
The Local Emergency Medical Communication Centers (LEMCs) play a crucial role in the Norwegian healthcare system by receiving calls for immediate medical assistance. Registered nurses operate the phone calls, and their task is to assess the situation and triage the caller into appropriate triage levels indicating when and how help should be provided. Telephone triage poses challenges due to the limitations of audio communication, time sensitivity, and complex decision-making. Additionally, nurses often face the burden of managing clinical tools across multiple interfaces. This thesis explored how to design a system to support nurses in telephone triage and how we can facilitate nurse-AI collaboration in the process. A Research through Design (RtD) methodology was employed, and an iterative design approach was utilized. The research investigated the design aspects of AI-based suggestions and the use of natural language when creating semi-structured documentation. Four prototype iterations were developed throughout the study, and researchers from RE-AIMED and telephone operators conducted evaluations of the prototypes. Designing a tool for telephone triage requires understanding the user's needs and workflow. It is, therefore, crucial to involve telephone operators in the design process. The prototype demonstrated how we could design for incorporating AI in the triage process, and this thesis explores the various considerations when designing for nurse-AI collaboration. One notable finding was the importance of enabling documentation in natural language, as relying solely on structured documentation may fail to capture the caller's specific situation. Additionally, it is important to design a system that facilitates documentation of patient-initiated information and questions initiated by the nurses or the system.Masteroppgave i informasjonsvitenskapINFO390MASV-INF
Artificial Intelligence Meets Virtual and Augmented Worlds (AIVR), in conjunction with SIGGRAPH Asia
- …