21,636 research outputs found

    Multimodal nested sampling: an efficient and robust alternative to MCMC methods for astronomical data analysis

    Full text link
    In performing a Bayesian analysis of astronomical data, two difficult problems often emerge. First, in estimating the parameters of some model for the data, the resulting posterior distribution may be multimodal or exhibit pronounced (curving) degeneracies, which can cause problems for traditional MCMC sampling methods. Second, in selecting between a set of competing models, calculation of the Bayesian evidence for each model is computationally expensive. The nested sampling method introduced by Skilling (2004), has greatly reduced the computational expense of calculating evidences and also produces posterior inferences as a by-product. This method has been applied successfully in cosmological applications by Mukherjee et al. (2006), but their implementation was efficient only for unimodal distributions without pronounced degeneracies. Shaw et al. (2007), recently introduced a clustered nested sampling method which is significantly more efficient in sampling from multimodal posteriors and also determines the expectation and variance of the final evidence from a single run of the algorithm, hence providing a further increase in efficiency. In this paper, we build on the work of Shaw et al. and present three new methods for sampling and evidence evaluation from distributions that may contain multiple modes and significant degeneracies; we also present an even more efficient technique for estimating the uncertainty on the evaluated evidence. These methods lead to a further substantial improvement in sampling efficiency and robustness, and are applied to toy problems to demonstrate the accuracy and economy of the evidence calculation and parameter estimation. Finally, we discuss the use of these methods in performing Bayesian object detection in astronomical datasets.Comment: 14 pages, 11 figures, submitted to MNRAS, some major additions to the previous version in response to the referee's comment

    Symbol Emergence in Robotics: A Survey

    Full text link
    Humans can learn the use of language through physical interaction with their environment and semiotic communication with other people. It is very important to obtain a computational understanding of how humans can form a symbol system and obtain semiotic skills through their autonomous mental development. Recently, many studies have been conducted on the construction of robotic systems and machine-learning methods that can learn the use of language through embodied multimodal interaction with their environment and other systems. Understanding human social interactions and developing a robot that can smoothly communicate with human users in the long term, requires an understanding of the dynamics of symbol systems and is crucially important. The embodied cognition and social interaction of participants gradually change a symbol system in a constructive manner. In this paper, we introduce a field of research called symbol emergence in robotics (SER). SER is a constructive approach towards an emergent symbol system. The emergent symbol system is socially self-organized through both semiotic communications and physical interactions with autonomous cognitive developmental agents, i.e., humans and developmental robots. Specifically, we describe some state-of-art research topics concerning SER, e.g., multimodal categorization, word discovery, and a double articulation analysis, that enable a robot to obtain words and their embodied meanings from raw sensory--motor information, including visual information, haptic information, auditory information, and acoustic speech signals, in a totally unsupervised manner. Finally, we suggest future directions of research in SER.Comment: submitted to Advanced Robotic

    Radio Galaxy Detection in the Visibility Domain

    Get PDF
    We explore a new Bayesian method of detecting galaxies from radio interferometric data of the faint sky. Working in the Fourier domain, we fit a single, parameterised galaxy model to simulated visibility data of star-forming galaxies. The resulting multimodal posterior distribution is then sampled using a multimodal nested sampling algorithm such as MultiNest. For each galaxy, we construct parameter estimates for the position, flux, scale-length and ellipticities from the posterior samples. We first test our approach on simulated SKA1-MID visibility data of up to 100 galaxies in the field of view, considering a typical weak lensing survey regime (SNR ≥10\ge 10) where 98% of the input galaxies are detected with no spurious source detections. We then explore the low SNR regime, finding our approach reliable in galaxy detection and providing in particular high accuracy in positional estimates down to SNR ∼5\sim 5. The presented method does not require transformation of visibilities to the image domain, and requires no prior knowledge of the number of galaxies in the field of view, thus could become a useful tool for constructing accurate radio galaxy catalogs in the future.Comment: 11 pages, 11 figures. Accepted for publication in MNRA

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    apk2vec: Semi-supervised multi-view representation learning for profiling Android applications

    Full text link
    Building behavior profiles of Android applications (apps) with holistic, rich and multi-view information (e.g., incorporating several semantic views of an app such as API sequences, system calls, etc.) would help catering downstream analytics tasks such as app categorization, recommendation and malware analysis significantly better. Towards this goal, we design a semi-supervised Representation Learning (RL) framework named apk2vec to automatically generate a compact representation (aka profile/embedding) for a given app. More specifically, apk2vec has the three following unique characteristics which make it an excellent choice for largescale app profiling: (1) it encompasses information from multiple semantic views such as API sequences, permissions, etc., (2) being a semi-supervised embedding technique, it can make use of labels associated with apps (e.g., malware family or app category labels) to build high quality app profiles, and (3) it combines RL and feature hashing which allows it to efficiently build profiles of apps that stream over time (i.e., online learning). The resulting semi-supervised multi-view hash embeddings of apps could then be used for a wide variety of downstream tasks such as the ones mentioned above. Our extensive evaluations with more than 42,000 apps demonstrate that apk2vec's app profiles could significantly outperform state-of-the-art techniques in four app analytics tasks namely, malware detection, familial clustering, app clone detection and app recommendation.Comment: International Conference on Data Mining, 201

    Making Laplacians commute

    Full text link
    In this paper, we construct multimodal spectral geometry by finding a pair of closest commuting operators (CCO) to a given pair of Laplacians. The CCOs are jointly diagonalizable and hence have the same eigenbasis. Our construction naturally extends classical data analysis tools based on spectral geometry, such as diffusion maps and spectral clustering. We provide several synthetic and real examples of applications in dimensionality reduction, shape analysis, and clustering, demonstrating that our method better captures the inherent structure of multi-modal data

    Unsupervised spike detection and sorting with wavelets and superparamagnetic clustering

    Get PDF
    This study introduces a new method for detecting and sorting spikes from multiunit recordings. The method combines the wavelet transform, which localizes distinctive spike features, with superparamagnetic clustering, which allows automatic classification of the data without assumptions such as low variance or gaussian distributions. Moreover, an improved method for setting amplitude thresholds for spike detection is proposed. We describe several criteria for implementation that render the algorithm unsupervised and fast. The algorithm is compared to other conventional methods using several simulated data sets whose characteristics closely resemble those of in vivo recordings. For these data sets, we found that the proposed algorithm outperformed conventional methods
    • …
    corecore