17,733 research outputs found
You can't always sketch what you want: Understanding Sensemaking in Visual Query Systems
Visual query systems (VQSs) empower users to interactively search for line
charts with desired visual patterns, typically specified using intuitive
sketch-based interfaces. Despite decades of past work on VQSs, these efforts
have not translated to adoption in practice, possibly because VQSs are largely
evaluated in unrealistic lab-based settings. To remedy this gap in adoption, we
collaborated with experts from three diverse domains---astronomy, genetics, and
material science---via a year-long user-centered design process to develop a
VQS that supports their workflow and analytical needs, and evaluate how VQSs
can be used in practice. Our study results reveal that ad-hoc sketch-only
querying is not as commonly used as prior work suggests, since analysts are
often unable to precisely express their patterns of interest. In addition, we
characterize three essential sensemaking processes supported by our enhanced
VQS. We discover that participants employ all three processes, but in different
proportions, depending on the analytical needs in each domain. Our findings
suggest that all three sensemaking processes must be integrated in order to
make future VQSs useful for a wide range of analytical inquiries.Comment: Accepted for presentation at IEEE VAST 2019, to be held October 20-25
in Vancouver, Canada. Paper will also be published in a special issue of IEEE
Transactions on Visualization and Computer Graphics (TVCG) IEEE VIS
(InfoVis/VAST/SciVis) 2019 ACM 2012 CCS - Human-centered computing,
Visualization, Visualization design and evaluation method
An Agent-Based Algorithm exploiting Multiple Local Dissimilarities for Clusters Mining and Knowledge Discovery
We propose a multi-agent algorithm able to automatically discover relevant
regularities in a given dataset, determining at the same time the set of
configurations of the adopted parametric dissimilarity measure yielding compact
and separated clusters. Each agent operates independently by performing a
Markovian random walk on a suitable weighted graph representation of the input
dataset. Such a weighted graph representation is induced by the specific
parameter configuration of the dissimilarity measure adopted by the agent,
which searches and takes decisions autonomously for one cluster at a time.
Results show that the algorithm is able to discover parameter configurations
that yield a consistent and interpretable collection of clusters. Moreover, we
demonstrate that our algorithm shows comparable performances with other similar
state-of-the-art algorithms when facing specific clustering problems
Understanding Learned Models by Identifying Important Features at the Right Resolution
In many application domains, it is important to characterize how complex
learned models make their decisions across the distribution of instances. One
way to do this is to identify the features and interactions among them that
contribute to a model's predictive accuracy. We present a model-agnostic
approach to this task that makes the following specific contributions. Our
approach (i) tests feature groups, in addition to base features, and tries to
determine the level of resolution at which important features can be
determined, (ii) uses hypothesis testing to rigorously assess the effect of
each feature on the model's loss, (iii) employs a hierarchical approach to
control the false discovery rate when testing feature groups and individual
base features for importance, and (iv) uses hypothesis testing to identify
important interactions among features and feature groups. We evaluate our
approach by analyzing random forest and LSTM neural network models learned in
two challenging biomedical applications.Comment: First two authors contributed equally to this work, Accepted for
presentation at the Thirty-Third AAAI Conference on Artificial Intelligence
(AAAI-19
Data Mining Techniques for Complex User-Generated Data
Nowadays, the amount of collected information is continuously growing in a variety of different domains. Data mining techniques are powerful instruments to effectively analyze these large data collections and extract hidden and useful knowledge.
Vast amount of User-Generated Data (UGD) is being created every day, such as user behavior, user-generated content, user exploitation of available services and user mobility in different domains. Some common critical issues arise for the UGD analysis process such as the large dataset cardinality and dimensionality, the variable data distribution and inherent sparseness, and the heterogeneous data to model the different facets of the targeted domain. Consequently, the extraction of useful knowledge from such data collections is a challenging task, and proper data mining solutions should be devised for the problem under analysis.
In this thesis work, we focus on the design and development of innovative solutions to support data mining activities over User-Generated Data characterised by different critical issues, via the integration of different data mining techniques in a unified frame- work. Real datasets coming from three example domains characterized by the above critical issues are considered as reference cases, i.e., health care, social network, and ur- ban environment domains. Experimental results show the effectiveness of the proposed approaches to discover useful knowledge from different domains
- …