29,965 research outputs found
Visual Integration of Data and Model Space in Ensemble Learning
Ensembles of classifier models typically deliver superior performance and can
outperform single classifier models given a dataset and classification task at
hand. However, the gain in performance comes together with the lack in
comprehensibility, posing a challenge to understand how each model affects the
classification outputs and where the errors come from. We propose a tight
visual integration of the data and the model space for exploring and combining
classifier models. We introduce a workflow that builds upon the visual
integration and enables the effective exploration of classification outputs and
models. We then present a use case in which we start with an ensemble
automatically selected by a standard ensemble selection algorithm, and show how
we can manipulate models and alternative combinations.Comment: 8 pages, 7 picture
Uncertainty Quantification Using Neural Networks for Molecular Property Prediction
Uncertainty quantification (UQ) is an important component of molecular
property prediction, particularly for drug discovery applications where model
predictions direct experimental design and where unanticipated imprecision
wastes valuable time and resources. The need for UQ is especially acute for
neural models, which are becoming increasingly standard yet are challenging to
interpret. While several approaches to UQ have been proposed in the literature,
there is no clear consensus on the comparative performance of these models. In
this paper, we study this question in the context of regression tasks. We
systematically evaluate several methods on five benchmark datasets using
multiple complementary performance metrics. Our experiments show that none of
the methods we tested is unequivocally superior to all others, and none
produces a particularly reliable ranking of errors across multiple datasets.
While we believe these results show that existing UQ methods are not sufficient
for all common use-cases and demonstrate the benefits of further research, we
conclude with a practical recommendation as to which existing techniques seem
to perform well relative to others
Measuring relative opinion from location-based social media: A case study of the 2016 U.S. presidential election
Social media has become an emerging alternative to opinion polls for public
opinion collection, while it is still posing many challenges as a passive data
source, such as structurelessness, quantifiability, and representativeness.
Social media data with geotags provide new opportunities to unveil the
geographic locations of users expressing their opinions. This paper aims to
answer two questions: 1) whether quantifiable measurement of public opinion can
be obtained from social media and 2) whether it can produce better or
complementary measures compared to opinion polls. This research proposes a
novel approach to measure the relative opinion of Twitter users towards public
issues in order to accommodate more complex opinion structures and take
advantage of the geography pertaining to the public issues. To ensure that this
new measure is technically feasible, a modeling framework is developed
including building a training dataset by adopting a state-of-the-art approach
and devising a new deep learning method called Opinion-Oriented Word Embedding.
With a case study of the tweets selected for the 2016 U.S. presidential
election, we demonstrate the predictive superiority of our relative opinion
approach and we show how it can aid visual analytics and support opinion
predictions. Although the relative opinion measure is proved to be more robust
compared to polling, our study also suggests that the former can advantageously
complement the later in opinion prediction
What May Visualization Processes Optimize?
In this paper, we present an abstract model of visualization and inference
processes and describe an information-theoretic measure for optimizing such
processes. In order to obtain such an abstraction, we first examined six
classes of workflows in data analysis and visualization, and identified four
levels of typical visualization components, namely disseminative,
observational, analytical and model-developmental visualization. We noticed a
common phenomenon at different levels of visualization, that is, the
transformation of data spaces (referred to as alphabets) usually corresponds to
the reduction of maximal entropy along a workflow. Based on this observation,
we establish an information-theoretic measure of cost-benefit ratio that may be
used as a cost function for optimizing a data visualization process. To
demonstrate the validity of this measure, we examined a number of successful
visualization processes in the literature, and showed that the
information-theoretic measure can mathematically explain the advantages of such
processes over possible alternatives.Comment: 10 page
Recommended from our members
Model granularity and related concepts
Models are integral to engineering design and basis for many decisions. Therefore, it is necessary to comprehend how a model’s properties might influence its behaviour. Model granularity is an important property but has so far only received limited attention. The terminology used to describe granularity and related phenomena varies and pertinent concepts are distributed across communities. This article positions granularity in the theoretical background of models, collects formal definitions for relevant terms from a range of communities and discusses the implications for engineering design
- …