168,490 research outputs found

    Learning what matters - Sampling interesting patterns

    Get PDF
    In the field of exploratory data mining, local structure in data can be described by patterns and discovered by mining algorithms. Although many solutions have been proposed to address the redundancy problems in pattern mining, most of them either provide succinct pattern sets or take the interests of the user into account-but not both. Consequently, the analyst has to invest substantial effort in identifying those patterns that are relevant to her specific interests and goals. To address this problem, we propose a novel approach that combines pattern sampling with interactive data mining. In particular, we introduce the LetSIP algorithm, which builds upon recent advances in 1) weighted sampling in SAT and 2) learning to rank in interactive pattern mining. Specifically, it exploits user feedback to directly learn the parameters of the sampling distribution that represents the user's interests. We compare the performance of the proposed algorithm to the state-of-the-art in interactive pattern mining by emulating the interests of a user. The resulting system allows efficient and interleaved learning and sampling, thus user-specific anytime data exploration. Finally, LetSIP demonstrates favourable trade-offs concerning both quality-diversity and exploitation-exploration when compared to existing methods.Comment: PAKDD 2017, extended versio

    Intelligent sampling for the measurement of structured surfaces

    Get PDF
    Uniform sampling in metrology has known drawbacks such as coherent spectral aliasing and a lack of efficiency in terms of measuring time and data storage. The requirement for intelligent sampling strategies has been outlined over recent years, particularly where the measurement of structured surfaces is concerned. Most of the present research on intelligent sampling has focused on dimensional metrology using coordinate-measuring machines with little reported on the area of surface metrology. In the research reported here, potential intelligent sampling strategies for surface topography measurement of structured surfaces are investigated by using numerical simulation and experimental verification. The methods include the jittered uniform method, low-discrepancy pattern sampling and several adaptive methods which originate from computer graphics, coordinate metrology and previous research by the authors. By combining the use of advanced reconstruction methods and feature-based characterization techniques, the measurement performance of the sampling methods is studied using case studies. The advantages, stability and feasibility of these techniques for practical measurements are discussed

    Finding long cycles in graphs

    Full text link
    We analyze the problem of discovering long cycles inside a graph. We propose and test two algorithms for this task. The first one is based on recent advances in statistical mechanics and relies on a message passing procedure. The second follows a more standard Monte Carlo Markov Chain strategy. Special attention is devoted to Hamiltonian cycles of (non-regular) random graphs of minimal connectivity equal to three

    Patterns of Scalable Bayesian Inference

    Full text link
    Datasets are growing not just in size but in complexity, creating a demand for rich models and quantification of uncertainty. Bayesian methods are an excellent fit for this demand, but scaling Bayesian inference is a challenge. In response to this challenge, there has been considerable recent work based on varying assumptions about model structure, underlying computational resources, and the importance of asymptotic correctness. As a result, there is a zoo of ideas with few clear overarching principles. In this paper, we seek to identify unifying principles, patterns, and intuitions for scaling Bayesian inference. We review existing work on utilizing modern computing resources with both MCMC and variational approximation techniques. From this taxonomy of ideas, we characterize the general principles that have proven successful for designing scalable inference procedures and comment on the path forward

    Random field sampling for a simplified model of melt-blowing considering turbulent velocity fluctuations

    Full text link
    In melt-blowing very thin liquid fiber jets are spun due to high-velocity air streams. In literature there is a clear, unsolved discrepancy between the measured and computed jet attenuation. In this paper we will verify numerically that the turbulent velocity fluctuations causing a random aerodynamic drag on the fiber jets -- that has been neglected so far -- are the crucial effect to close this gap. For this purpose, we model the velocity fluctuations as vector Gaussian random fields on top of a k-epsilon turbulence description and develop an efficient sampling procedure. Taking advantage of the special covariance structure the effort of the sampling is linear in the discretization and makes the realization possible

    Accelerated High-Resolution Photoacoustic Tomography via Compressed Sensing

    Get PDF
    Current 3D photoacoustic tomography (PAT) systems offer either high image quality or high frame rates but are not able to deliver high spatial and temporal resolution simultaneously, which limits their ability to image dynamic processes in living tissue. A particular example is the planar Fabry-Perot (FP) scanner, which yields high-resolution images but takes several minutes to sequentially map the photoacoustic field on the sensor plane, point-by-point. However, as the spatio-temporal complexity of many absorbing tissue structures is rather low, the data recorded in such a conventional, regularly sampled fashion is often highly redundant. We demonstrate that combining variational image reconstruction methods using spatial sparsity constraints with the development of novel PAT acquisition systems capable of sub-sampling the acoustic wave field can dramatically increase the acquisition speed while maintaining a good spatial resolution: First, we describe and model two general spatial sub-sampling schemes. Then, we discuss how to implement them using the FP scanner and demonstrate the potential of these novel compressed sensing PAT devices through simulated data from a realistic numerical phantom and through measured data from a dynamic experimental phantom as well as from in-vivo experiments. Our results show that images with good spatial resolution and contrast can be obtained from highly sub-sampled PAT data if variational image reconstruction methods that describe the tissues structures with suitable sparsity-constraints are used. In particular, we examine the use of total variation regularization enhanced by Bregman iterations. These novel reconstruction strategies offer new opportunities to dramatically increase the acquisition speed of PAT scanners that employ point-by-point sequential scanning as well as reducing the channel count of parallelized schemes that use detector arrays.Comment: submitted to "Physics in Medicine and Biology

    Mining Frequent Graph Patterns with Differential Privacy

    Full text link
    Discovering frequent graph patterns in a graph database offers valuable information in a variety of applications. However, if the graph dataset contains sensitive data of individuals such as mobile phone-call graphs and web-click graphs, releasing discovered frequent patterns may present a threat to the privacy of individuals. {\em Differential privacy} has recently emerged as the {\em de facto} standard for private data analysis due to its provable privacy guarantee. In this paper we propose the first differentially private algorithm for mining frequent graph patterns. We first show that previous techniques on differentially private discovery of frequent {\em itemsets} cannot apply in mining frequent graph patterns due to the inherent complexity of handling structural information in graphs. We then address this challenge by proposing a Markov Chain Monte Carlo (MCMC) sampling based algorithm. Unlike previous work on frequent itemset mining, our techniques do not rely on the output of a non-private mining algorithm. Instead, we observe that both frequent graph pattern mining and the guarantee of differential privacy can be unified into an MCMC sampling framework. In addition, we establish the privacy and utility guarantee of our algorithm and propose an efficient neighboring pattern counting technique as well. Experimental results show that the proposed algorithm is able to output frequent patterns with good precision
    • …
    corecore