168,490 research outputs found
Learning what matters - Sampling interesting patterns
In the field of exploratory data mining, local structure in data can be
described by patterns and discovered by mining algorithms. Although many
solutions have been proposed to address the redundancy problems in pattern
mining, most of them either provide succinct pattern sets or take the interests
of the user into account-but not both. Consequently, the analyst has to invest
substantial effort in identifying those patterns that are relevant to her
specific interests and goals. To address this problem, we propose a novel
approach that combines pattern sampling with interactive data mining. In
particular, we introduce the LetSIP algorithm, which builds upon recent
advances in 1) weighted sampling in SAT and 2) learning to rank in interactive
pattern mining. Specifically, it exploits user feedback to directly learn the
parameters of the sampling distribution that represents the user's interests.
We compare the performance of the proposed algorithm to the state-of-the-art in
interactive pattern mining by emulating the interests of a user. The resulting
system allows efficient and interleaved learning and sampling, thus
user-specific anytime data exploration. Finally, LetSIP demonstrates favourable
trade-offs concerning both quality-diversity and exploitation-exploration when
compared to existing methods.Comment: PAKDD 2017, extended versio
Intelligent sampling for the measurement of structured surfaces
Uniform sampling in metrology has known drawbacks such as coherent spectral aliasing and a lack of efficiency in terms of measuring time and data storage. The requirement for intelligent sampling strategies has been outlined over recent years, particularly where the measurement of structured surfaces is concerned. Most of the present research on intelligent sampling has focused on dimensional metrology using coordinate-measuring machines with little reported on the area of surface metrology. In the research reported here, potential intelligent sampling strategies for surface topography measurement of structured surfaces are investigated by using numerical simulation and experimental verification. The methods include the jittered uniform method, low-discrepancy pattern sampling and several adaptive methods which originate from computer graphics, coordinate metrology and previous research by the authors. By combining the use of advanced reconstruction methods and feature-based characterization techniques, the measurement performance of the sampling methods is studied using case studies. The advantages, stability and feasibility of these techniques for practical measurements are discussed
Finding long cycles in graphs
We analyze the problem of discovering long cycles inside a graph. We propose
and test two algorithms for this task. The first one is based on recent
advances in statistical mechanics and relies on a message passing procedure.
The second follows a more standard Monte Carlo Markov Chain strategy. Special
attention is devoted to Hamiltonian cycles of (non-regular) random graphs of
minimal connectivity equal to three
Patterns of Scalable Bayesian Inference
Datasets are growing not just in size but in complexity, creating a demand
for rich models and quantification of uncertainty. Bayesian methods are an
excellent fit for this demand, but scaling Bayesian inference is a challenge.
In response to this challenge, there has been considerable recent work based on
varying assumptions about model structure, underlying computational resources,
and the importance of asymptotic correctness. As a result, there is a zoo of
ideas with few clear overarching principles.
In this paper, we seek to identify unifying principles, patterns, and
intuitions for scaling Bayesian inference. We review existing work on utilizing
modern computing resources with both MCMC and variational approximation
techniques. From this taxonomy of ideas, we characterize the general principles
that have proven successful for designing scalable inference procedures and
comment on the path forward
Random field sampling for a simplified model of melt-blowing considering turbulent velocity fluctuations
In melt-blowing very thin liquid fiber jets are spun due to high-velocity air
streams. In literature there is a clear, unsolved discrepancy between the
measured and computed jet attenuation. In this paper we will verify numerically
that the turbulent velocity fluctuations causing a random aerodynamic drag on
the fiber jets -- that has been neglected so far -- are the crucial effect to
close this gap. For this purpose, we model the velocity fluctuations as vector
Gaussian random fields on top of a k-epsilon turbulence description and develop
an efficient sampling procedure. Taking advantage of the special covariance
structure the effort of the sampling is linear in the discretization and makes
the realization possible
Accelerated High-Resolution Photoacoustic Tomography via Compressed Sensing
Current 3D photoacoustic tomography (PAT) systems offer either high image
quality or high frame rates but are not able to deliver high spatial and
temporal resolution simultaneously, which limits their ability to image dynamic
processes in living tissue. A particular example is the planar Fabry-Perot (FP)
scanner, which yields high-resolution images but takes several minutes to
sequentially map the photoacoustic field on the sensor plane, point-by-point.
However, as the spatio-temporal complexity of many absorbing tissue structures
is rather low, the data recorded in such a conventional, regularly sampled
fashion is often highly redundant. We demonstrate that combining variational
image reconstruction methods using spatial sparsity constraints with the
development of novel PAT acquisition systems capable of sub-sampling the
acoustic wave field can dramatically increase the acquisition speed while
maintaining a good spatial resolution: First, we describe and model two general
spatial sub-sampling schemes. Then, we discuss how to implement them using the
FP scanner and demonstrate the potential of these novel compressed sensing PAT
devices through simulated data from a realistic numerical phantom and through
measured data from a dynamic experimental phantom as well as from in-vivo
experiments. Our results show that images with good spatial resolution and
contrast can be obtained from highly sub-sampled PAT data if variational image
reconstruction methods that describe the tissues structures with suitable
sparsity-constraints are used. In particular, we examine the use of total
variation regularization enhanced by Bregman iterations. These novel
reconstruction strategies offer new opportunities to dramatically increase the
acquisition speed of PAT scanners that employ point-by-point sequential
scanning as well as reducing the channel count of parallelized schemes that use
detector arrays.Comment: submitted to "Physics in Medicine and Biology
Mining Frequent Graph Patterns with Differential Privacy
Discovering frequent graph patterns in a graph database offers valuable
information in a variety of applications. However, if the graph dataset
contains sensitive data of individuals such as mobile phone-call graphs and
web-click graphs, releasing discovered frequent patterns may present a threat
to the privacy of individuals. {\em Differential privacy} has recently emerged
as the {\em de facto} standard for private data analysis due to its provable
privacy guarantee. In this paper we propose the first differentially private
algorithm for mining frequent graph patterns.
We first show that previous techniques on differentially private discovery of
frequent {\em itemsets} cannot apply in mining frequent graph patterns due to
the inherent complexity of handling structural information in graphs. We then
address this challenge by proposing a Markov Chain Monte Carlo (MCMC) sampling
based algorithm. Unlike previous work on frequent itemset mining, our
techniques do not rely on the output of a non-private mining algorithm.
Instead, we observe that both frequent graph pattern mining and the guarantee
of differential privacy can be unified into an MCMC sampling framework. In
addition, we establish the privacy and utility guarantee of our algorithm and
propose an efficient neighboring pattern counting technique as well.
Experimental results show that the proposed algorithm is able to output
frequent patterns with good precision
- …