21,636 research outputs found
Multimodal nested sampling: an efficient and robust alternative to MCMC methods for astronomical data analysis
In performing a Bayesian analysis of astronomical data, two difficult
problems often emerge. First, in estimating the parameters of some model for
the data, the resulting posterior distribution may be multimodal or exhibit
pronounced (curving) degeneracies, which can cause problems for traditional
MCMC sampling methods. Second, in selecting between a set of competing models,
calculation of the Bayesian evidence for each model is computationally
expensive. The nested sampling method introduced by Skilling (2004), has
greatly reduced the computational expense of calculating evidences and also
produces posterior inferences as a by-product. This method has been applied
successfully in cosmological applications by Mukherjee et al. (2006), but their
implementation was efficient only for unimodal distributions without pronounced
degeneracies. Shaw et al. (2007), recently introduced a clustered nested
sampling method which is significantly more efficient in sampling from
multimodal posteriors and also determines the expectation and variance of the
final evidence from a single run of the algorithm, hence providing a further
increase in efficiency. In this paper, we build on the work of Shaw et al. and
present three new methods for sampling and evidence evaluation from
distributions that may contain multiple modes and significant degeneracies; we
also present an even more efficient technique for estimating the uncertainty on
the evaluated evidence. These methods lead to a further substantial improvement
in sampling efficiency and robustness, and are applied to toy problems to
demonstrate the accuracy and economy of the evidence calculation and parameter
estimation. Finally, we discuss the use of these methods in performing Bayesian
object detection in astronomical datasets.Comment: 14 pages, 11 figures, submitted to MNRAS, some major additions to the
previous version in response to the referee's comment
Symbol Emergence in Robotics: A Survey
Humans can learn the use of language through physical interaction with their
environment and semiotic communication with other people. It is very important
to obtain a computational understanding of how humans can form a symbol system
and obtain semiotic skills through their autonomous mental development.
Recently, many studies have been conducted on the construction of robotic
systems and machine-learning methods that can learn the use of language through
embodied multimodal interaction with their environment and other systems.
Understanding human social interactions and developing a robot that can
smoothly communicate with human users in the long term, requires an
understanding of the dynamics of symbol systems and is crucially important. The
embodied cognition and social interaction of participants gradually change a
symbol system in a constructive manner. In this paper, we introduce a field of
research called symbol emergence in robotics (SER). SER is a constructive
approach towards an emergent symbol system. The emergent symbol system is
socially self-organized through both semiotic communications and physical
interactions with autonomous cognitive developmental agents, i.e., humans and
developmental robots. Specifically, we describe some state-of-art research
topics concerning SER, e.g., multimodal categorization, word discovery, and a
double articulation analysis, that enable a robot to obtain words and their
embodied meanings from raw sensory--motor information, including visual
information, haptic information, auditory information, and acoustic speech
signals, in a totally unsupervised manner. Finally, we suggest future
directions of research in SER.Comment: submitted to Advanced Robotic
Radio Galaxy Detection in the Visibility Domain
We explore a new Bayesian method of detecting galaxies from radio
interferometric data of the faint sky. Working in the Fourier domain, we fit a
single, parameterised galaxy model to simulated visibility data of star-forming
galaxies. The resulting multimodal posterior distribution is then sampled using
a multimodal nested sampling algorithm such as MultiNest. For each galaxy, we
construct parameter estimates for the position, flux, scale-length and
ellipticities from the posterior samples. We first test our approach on
simulated SKA1-MID visibility data of up to 100 galaxies in the field of view,
considering a typical weak lensing survey regime (SNR ) where 98% of
the input galaxies are detected with no spurious source detections. We then
explore the low SNR regime, finding our approach reliable in galaxy detection
and providing in particular high accuracy in positional estimates down to SNR
. The presented method does not require transformation of visibilities
to the image domain, and requires no prior knowledge of the number of galaxies
in the field of view, thus could become a useful tool for constructing accurate
radio galaxy catalogs in the future.Comment: 11 pages, 11 figures. Accepted for publication in MNRA
Machine Learning and Integrative Analysis of Biomedical Big Data.
Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues
apk2vec: Semi-supervised multi-view representation learning for profiling Android applications
Building behavior profiles of Android applications (apps) with holistic, rich
and multi-view information (e.g., incorporating several semantic views of an
app such as API sequences, system calls, etc.) would help catering downstream
analytics tasks such as app categorization, recommendation and malware analysis
significantly better. Towards this goal, we design a semi-supervised
Representation Learning (RL) framework named apk2vec to automatically generate
a compact representation (aka profile/embedding) for a given app. More
specifically, apk2vec has the three following unique characteristics which make
it an excellent choice for largescale app profiling: (1) it encompasses
information from multiple semantic views such as API sequences, permissions,
etc., (2) being a semi-supervised embedding technique, it can make use of
labels associated with apps (e.g., malware family or app category labels) to
build high quality app profiles, and (3) it combines RL and feature hashing
which allows it to efficiently build profiles of apps that stream over time
(i.e., online learning). The resulting semi-supervised multi-view hash
embeddings of apps could then be used for a wide variety of downstream tasks
such as the ones mentioned above. Our extensive evaluations with more than
42,000 apps demonstrate that apk2vec's app profiles could significantly
outperform state-of-the-art techniques in four app analytics tasks namely,
malware detection, familial clustering, app clone detection and app
recommendation.Comment: International Conference on Data Mining, 201
Making Laplacians commute
In this paper, we construct multimodal spectral geometry by finding a pair of
closest commuting operators (CCO) to a given pair of Laplacians. The CCOs are
jointly diagonalizable and hence have the same eigenbasis. Our construction
naturally extends classical data analysis tools based on spectral geometry,
such as diffusion maps and spectral clustering. We provide several synthetic
and real examples of applications in dimensionality reduction, shape analysis,
and clustering, demonstrating that our method better captures the inherent
structure of multi-modal data
Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering
This study introduces a new method for detecting and sorting spikes from multiunit recordings. The method combines the wavelet transform, which localizes distinctive spike features, with superparamagnetic clustering,
which allows automatic classification of the data without assumptions such as low variance or gaussian distributions. Moreover, an improved method for setting amplitude thresholds for spike detection is proposed. We describe several criteria for implementation that render the algorithm unsupervised and fast. The algorithm is compared to other conventional methods using several simulated data sets whose characteristics closely resemble those of in vivo recordings. For these data sets, we found that
the proposed algorithm outperformed conventional methods
- …