438 research outputs found
An Enhanced Visualization Process Model for Incremental Visualization
With today’s technical possibilities, a stable visualization scenario can no longer be assumed as a matter of course, as underlying data and targeted display setup are much more in flux than in traditional scenarios. Incremental visualization approaches are a means to address this challenge, as they permit the user to interact with, steer, and change the visualization at intermediate time points and not just after it has been completed. In this paper, we put forward a model for incremental visualizations that is based on the established Data State Reference Model, but extends it in ways to also represent partitioned data and visualization operators to facilitate intermediate visualization updates. In combination, partitioned data and operators can be used independently and in combination to strike tailored compromises between output quality, shown data quantity, and responsiveness—i.e., frame rates. We showcase the new expressive power of this model by discussing the opportunities and challenges of incremental visualization in general and its usage in a real world scenario in particular
Towards Exascale Scientific Metadata Management
Advances in technology and computing hardware are enabling scientists from
all areas of science to produce massive amounts of data using large-scale
simulations or observational facilities. In this era of data deluge, effective
coordination between the data production and the analysis phases hinges on the
availability of metadata that describe the scientific datasets. Existing
workflow engines have been capturing a limited form of metadata to provide
provenance information about the identity and lineage of the data. However,
much of the data produced by simulations, experiments, and analyses still need
to be annotated manually in an ad hoc manner by domain scientists. Systematic
and transparent acquisition of rich metadata becomes a crucial prerequisite to
sustain and accelerate the pace of scientific innovation. Yet, ubiquitous and
domain-agnostic metadata management infrastructure that can meet the demands of
extreme-scale science is notable by its absence.
To address this gap in scientific data management research and practice, we
present our vision for an integrated approach that (1) automatically captures
and manipulates information-rich metadata while the data is being produced or
analyzed and (2) stores metadata within each dataset to permeate
metadata-oblivious processes and to query metadata through established and
standardized data access interfaces. We motivate the need for the proposed
integrated approach using applications from plasma physics, climate modeling
and neuroscience, and then discuss research challenges and possible solutions
Hi, how can I help you?: Automating enterprise IT support help desks
Question answering is one of the primary challenges of natural language
understanding. In realizing such a system, providing complex long answers to
questions is a challenging task as opposed to factoid answering as the former
needs context disambiguation. The different methods explored in the literature
can be broadly classified into three categories namely: 1) classification
based, 2) knowledge graph based and 3) retrieval based. Individually, none of
them address the need of an enterprise wide assistance system for an IT support
and maintenance domain. In this domain the variance of answers is large ranging
from factoid to structured operating procedures; the knowledge is present
across heterogeneous data sources like application specific documentation,
ticket management systems and any single technique for a general purpose
assistance is unable to scale for such a landscape. To address this, we have
built a cognitive platform with capabilities adopted for this domain. Further,
we have built a general purpose question answering system leveraging the
platform that can be instantiated for multiple products, technologies in the
support domain. The system uses a novel hybrid answering model that
orchestrates across a deep learning classifier, a knowledge graph based context
disambiguation module and a sophisticated bag-of-words search system. This
orchestration performs context switching for a provided question and also does
a smooth hand-off of the question to a human expert if none of the automated
techniques can provide a confident answer. This system has been deployed across
675 internal enterprise IT support and maintenance projects.Comment: To appear in IAAI 201
Prioritization of Software and System Requirements through Natural Language Processing for Testing Software
Tese de mestrado, Ciência de Dados, Universidade de Lisboa, Faculdade de Ciências, 2021Safety¬critical systems have been a constant and increased presence in industrial production, such as railways and vehicles. These systems are highly configurable and must be intensively tested by system engineers before being deliverable to customers. This process is highly time¬consuming and might require associations between the product features and requirements demanded by customers. Requirement prioritization looks to recognize the most relevant requirements of a system, aiming to reduce the costs and time of the testing process. Machine Learning has been shown useful in helping engineers in this task, automating associations between features and requirements. However, its application can be more difficult when requirements are written in natural language and if a ground truth dataset does not exist with them. In our work, we present ARRINA, a Natural Language Processing¬based recommendation system able to extract and associate components from safety¬critical systems with their specifications written in natural language and process customer requirements and map them to components. The system integrates a Weight Association Rule Mining framework to extract the components and their associations and generates visualizations that can help engineers understand which components are generally introduced in project requirements. The system also includes a recommendation framework that can associate in put requirements to existing subsystems, reducing engineers’ effort in terms of requirement analysis and prioritization. We performed several experiments to evaluate the different components of ARRINA over four railway’s subsystems and input requirements. As a result, the system achieved 90% of accuracy, which denotes its importance in reducing the time¬consuming of engineers in discovering the correct subsystem links and prioritizing requirements for the testing process
Waypoint-Based Imitation Learning for Robotic Manipulation
While imitation learning methods have seen a resurgent interest for robotic
manipulation, the well-known problem of compounding errors continues to afflict
behavioral cloning (BC). Waypoints can help address this problem by reducing
the horizon of the learning problem for BC, and thus, the errors compounded
over time. However, waypoint labeling is underspecified, and requires
additional human supervision. Can we generate waypoints automatically without
any additional human supervision? Our key insight is that if a trajectory
segment can be approximated by linear motion, the endpoints can be used as
waypoints. We propose Automatic Waypoint Extraction (AWE) for imitation
learning, a preprocessing module to decompose a demonstration into a minimal
set of waypoints which when interpolated linearly can approximate the
trajectory up to a specified error threshold. AWE can be combined with any BC
algorithm, and we find that AWE can increase the success rate of
state-of-the-art algorithms by up to 25% in simulation and by 4-28% on
real-world bimanual manipulation tasks, reducing the decision making horizon by
up to a factor of 10. Videos and code are available at
https://lucys0.github.io/awe/Comment: The first two authors contributed equall
Decoupling algorithms from schedules for easy optimization of image processing pipelines
Using existing programming tools, writing high-performance image processing code requires sacrificing readability, portability, and modularity. We argue that this is a consequence of conflating what computations define the algorithm, with decisions about storage and the order of computation. We refer to these latter two concerns as the schedule, including choices of tiling, fusion, recomputation vs. storage, vectorization, and parallelism.
We propose a representation for feed-forward imaging pipelines that separates the algorithm from its schedule, enabling high-performance without sacrificing code clarity. This decoupling simplifies the algorithm specification: images and intermediate buffers become functions over an infinite integer domain, with no explicit storage or boundary conditions. Imaging pipelines are compositions of functions. Programmers separately specify scheduling strategies for the various functions composing the algorithm, which allows them to efficiently explore different optimizations without changing the algorithmic code.
We demonstrate the power of this representation by expressing a range of recent image processing applications in an embedded domain specific language called Halide, and compiling them for ARM, x86, and GPUs. Our compiler targets SIMD units, multiple cores, and complex memory hierarchies. We demonstrate that it can handle algorithms such as a camera raw pipeline, the bilateral grid, fast local Laplacian filtering, and image segmentation. The algorithms expressed in our language are both shorter and faster than state-of-the-art implementations.National Science Foundation (U.S.) (Grant 0964004)National Science Foundation (U.S.) (Grant 0964218)National Science Foundation (U.S.) (Grant 0832997)United States. Dept. of Energy (Award DE-SC0005288)Cognex CorporationAdobe System
Reordering in statistical machine translation
PhDMachine translation is a challenging task that its difficulties arise from several characteristics
of natural language. The main focus of this work is on reordering as one of
the major problems in MT and statistical MT, which is the method investigated in this
research. The reordering problem in SMT originates from the fact that not all the words
in a sentence can be consecutively translated. This means words must be skipped and
be translated out of their order in the source sentence to produce a fluent and grammatically
correct sentence in the target language. The main reason that reordering is
needed is the fundamental word order differences between languages. Therefore, reordering
becomes a more dominant issue, the more source and target languages are
structurally different.
The aim of this thesis is to study the reordering phenomenon by proposing new methods
of dealing with reordering in SMT decoders and evaluating the effectiveness of
the methods and the importance of reordering in the context of natural language processing
tasks. In other words, we propose novel ways of performing the decoding to
improve the reordering capabilities of the SMT decoder and in addition we explore
the effect of improving the reordering on the quality of specific NLP tasks, namely
named entity recognition and cross-lingual text association. Meanwhile, we go beyond
reordering in text association and present a method to perform cross-lingual text fragment
alignment, based on models of divergence from randomness.
The main contribution of this thesis is a novel method named dynamic distortion,
which is designed to improve the ability of the phrase-based decoder in performing
reordering by adjusting the distortion parameter based on the translation context. The
model employs a discriminative reordering model, which is combining several fea-
2
tures including lexical and syntactic, to predict the necessary distortion limit for each
sentence and each hypothesis expansion. The discriminative reordering model is also
integrated into the decoder as an extra feature. The method achieves substantial improvements
over the baseline without increase in the decoding time by avoiding reordering
in unnecessary positions.
Another novel method is also presented to extend the phrase-based decoder to dynamically
chunk, reorder, and apply phrase translations in tandem. Words inside the chunks
are moved together to enable the decoder to make long-distance reorderings to capture
the word order differences between languages with different sentence structures.
Another aspect of this work is the task-based evaluation of the reordering methods and
other translation algorithms used in the phrase-based SMT systems. With more successful
SMT systems, performing multi-lingual and cross-lingual tasks through translating
becomes more feasible. We have devised a method to evaluate the performance
of state-of-the art named entity recognisers on the text translated by a SMT decoder.
Specifically, we investigated the effect of word reordering and incorporating reordering
models in improving the quality of named entity extraction.
In addition to empirically investigating the effect of translation in the context of crosslingual
document association, we have described a text fragment alignment algorithm
to find sections of the two documents in different languages, that are content-wise related.
The algorithm uses similarity measures based on divergence from randomness
and word-based translation models to perform text fragment alignment on a collection
of documents in two different languages.
All the methods proposed in this thesis are extensively empirically examined. We have
tested all the algorithms on common translation collections used in different evaluation
campaigns. Well known automatic evaluation metrics are used to compare the
suggested methods to a state-of-the art baseline and results are analysed and discussed
Unsupervised Chunking with Hierarchical RNN
In Natural Language Processing (NLP), predicting linguistic structures, such
as parsing and chunking, has mostly relied on manual annotations of syntactic
structures. This paper introduces an unsupervised approach to chunking, a
syntactic task that involves grouping words in a non-hierarchical manner. We
present a two-layer Hierarchical Recurrent Neural Network (HRNN) designed to
model word-to-chunk and chunk-to-sentence compositions. Our approach involves a
two-stage training process: pretraining with an unsupervised parser and
finetuning on downstream NLP tasks. Experiments on the CoNLL-2000 dataset
reveal a notable improvement over existing unsupervised methods, enhancing
phrase F1 score by up to 6 percentage points. Further, finetuning with
downstream tasks results in an additional performance improvement.
Interestingly, we observe that the emergence of the chunking structure is
transient during the neural model's downstream-task training. This study
contributes to the advancement of unsupervised syntactic structure discovery
and opens avenues for further research in linguistic theory
- …