13,428 research outputs found
What determines auditory similarity? The effect of stimulus group and methodology.
Two experiments on the internal representation of auditory stimuli compared the pairwise and grouping methodologies as means of deriving similarity judgements. A total of 45 undergraduate students participated in each experiment, judging the similarity of short auditory stimuli, using one of the methodologies. The experiments support and extend Bonebright's (1996) findings, using a further 60 stimuli. Results from both methodologies highlight the importance of category information and acoustic features, such as root mean square (RMS) power and pitch, in similarity judgements. Results showed that the grouping task is a viable alternative to the pairwise task with N > 20 sounds whilst highlighting subtle differences, such as cluster tightness, between the different task results. The grouping task is more likely to yield category information as underlying similarity judgements
Multiple Correspondence Analysis & the Multilogit Bilinear Model
Multiple Correspondence Analysis (MCA) is a dimension reduction method which
plays a large role in the analysis of tables with categorical nominal variables
such as survey data. Though it is usually motivated and derived using geometric
considerations, in fact we prove that it amounts to a single proximal Newtown
step of a natural bilinear exponential family model for categorical data the
multinomial logit bilinear model. We compare and contrast the behavior of MCA
with that of the model on simulations and discuss new insights on the
properties of both exploratory multivariate methods and their cognate models.
One main conclusion is that we could recommend to approximate the multilogit
model parameters using MCA. Indeed, estimating the parameters of the model is
not a trivial task whereas MCA has the great advantage of being easily solved
by singular value decomposition and scalable to large data
Modeling Individual Cyclic Variation in Human Behavior
Cycles are fundamental to human health and behavior. However, modeling cycles
in time series data is challenging because in most cases the cycles are not
labeled or directly observed and need to be inferred from multidimensional
measurements taken over time. Here, we present CyHMMs, a cyclic hidden Markov
model method for detecting and modeling cycles in a collection of
multidimensional heterogeneous time series data. In contrast to previous cycle
modeling methods, CyHMMs deal with a number of challenges encountered in
modeling real-world cycles: they can model multivariate data with discrete and
continuous dimensions; they explicitly model and are robust to missing data;
and they can share information across individuals to model variation both
within and between individual time series. Experiments on synthetic and
real-world health-tracking data demonstrate that CyHMMs infer cycle lengths
more accurately than existing methods, with 58% lower error on simulated data
and 63% lower error on real-world data compared to the best-performing
baseline. CyHMMs can also perform functions which baselines cannot: they can
model the progression of individual features/symptoms over the course of the
cycle, identify the most variable features, and cluster individual time series
into groups with distinct characteristics. Applying CyHMMs to two real-world
health-tracking datasets -- of menstrual cycle symptoms and physical activity
tracking data -- yields important insights including which symptoms to expect
at each point during the cycle. We also find that people fall into several
groups with distinct cycle patterns, and that these groups differ along
dimensions not provided to the model. For example, by modeling missing data in
the menstrual cycles dataset, we are able to discover a medically relevant
group of birth control users even though information on birth control is not
given to the model.Comment: Accepted at WWW 201
Heuristic Approaches for Generating Local Process Models through Log Projections
Local Process Model (LPM) discovery is focused on the mining of a set of
process models where each model describes the behavior represented in the event
log only partially, i.e. subsets of possible events are taken into account to
create so-called local process models. Often such smaller models provide
valuable insights into the behavior of the process, especially when no adequate
and comprehensible single overall process model exists that is able to describe
the traces of the process from start to end. The practical application of LPM
discovery is however hindered by computational issues in the case of logs with
many activities (problems may already occur when there are more than 17 unique
activities). In this paper, we explore three heuristics to discover subsets of
activities that lead to useful log projections with the goal of speeding up LPM
discovery considerably while still finding high-quality LPMs. We found that a
Markov clustering approach to create projection sets results in the largest
improvement of execution time, with discovered LPMs still being better than
with the use of randomly generated activity sets of the same size. Another
heuristic, based on log entropy, yields a more moderate speedup, but enables
the discovery of higher quality LPMs. The third heuristic, based on the
relative information gain, shows unstable performance: for some data sets the
speedup and LPM quality are higher than with the log entropy based method,
while for other data sets there is no speedup at all.Comment: paper accepted and to appear in the proceedings of the IEEE Symposium
on Computational Intelligence and Data Mining (CIDM), special session on
Process Mining, part of the Symposium Series on Computational Intelligence
(SSCI
Multilayer Networks
In most natural and engineered systems, a set of entities interact with each
other in complicated patterns that can encompass multiple types of
relationships, change in time, and include other types of complications. Such
systems include multiple subsystems and layers of connectivity, and it is
important to take such "multilayer" features into account to try to improve our
understanding of complex systems. Consequently, it is necessary to generalize
"traditional" network theory by developing (and validating) a framework and
associated tools to study multilayer systems in a comprehensive fashion. The
origins of such efforts date back several decades and arose in multiple
disciplines, and now the study of multilayer networks has become one of the
most important directions in network science. In this paper, we discuss the
history of multilayer networks (and related concepts) and review the exploding
body of work on such networks. To unify the disparate terminology in the large
body of recent work, we discuss a general framework for multilayer networks,
construct a dictionary of terminology to relate the numerous existing concepts
to each other, and provide a thorough discussion that compares, contrasts, and
translates between related notions such as multilayer networks, multiplex
networks, interdependent networks, networks of networks, and many others. We
also survey and discuss existing data sets that can be represented as
multilayer networks. We review attempts to generalize single-layer-network
diagnostics to multilayer networks. We also discuss the rapidly expanding
research on multilayer-network models and notions like community structure,
connected components, tensor decompositions, and various types of dynamical
processes on multilayer networks. We conclude with a summary and an outlook.Comment: Working paper; 59 pages, 8 figure
- …