34,776 research outputs found
Comparison Of Reionization Models: Radiative Transfer Simulations And Approximate, Semi-Numeric Models
We compare the predictions of four different algorithms for the distribution
of ionized gas during the Epoch of Reionization. These algorithms are all used
to run a 100 Mpc/h simulation of reionization with the same initial conditions.
Two of the algorithms are state-of-the-art ray-tracing radiative transfer codes
that use disparate methods to calculate the ionization history. The other two
algorithms are fast but more approximate schemes based on iterative application
of a smoothing filter to the underlying source and density fields. We compare
these algorithms' resulting ionization and 21 cm fields using several different
statistical measures. The two radiative transfer schemes are in excellent
agreement with each other (with the cross-correlation coefficient of the
ionization fields >0.8 for k < 10 h/Mpc and in good agreement with the analytic
schemes (>0.6 for k < 1 h/Mpc). When used to predict the 21cm power spectrum at
different times during reionization, all ionization algorithms agree with one
another at the 10s of percent level. This agreement suggests that the different
approximations involved in the ray tracing algorithms are sensible and that
semi-numerical schemes provide a numerically-inexpensive, yet fairly accurate,
description of the reionization process.Comment: 13 pages, 10 figure
Foundations of Declarative Data Analysis Using Limit Datalog Programs
Motivated by applications in declarative data analysis, we study
---an extension of positive Datalog with
arithmetic functions over integers. This language is known to be undecidable,
so we propose two fragments. In
predicates are axiomatised to keep minimal/maximal numeric values, allowing us
to show that fact entailment is coNExpTime-complete in combined, and
coNP-complete in data complexity. Moreover, an additional
requirement causes the complexity to drop to ExpTime and PTime, respectively.
Finally, we show that stable can express many
useful data analysis tasks, and so our results provide a sound foundation for
the development of advanced information systems.Comment: 23 pages; full version of a paper accepted at IJCAI-17; v2 fixes some
typos and improves the acknowledgment
Wrapper Maintenance: A Machine Learning Approach
The proliferation of online information sources has led to an increased use
of wrappers for extracting data from Web sources. While most of the previous
research has focused on quick and efficient generation of wrappers, the
development of tools for wrapper maintenance has received less attention. This
is an important research problem because Web sources often change in ways that
prevent the wrappers from extracting data correctly. We present an efficient
algorithm that learns structural information about data from positive examples
alone. We describe how this information can be used for two wrapper maintenance
applications: wrapper verification and reinduction. The wrapper verification
system detects when a wrapper is not extracting correct data, usually because
the Web source has changed its format. The reinduction algorithm automatically
recovers from changes in the Web source by identifying data on Web pages so
that a new wrapper may be generated for this source. To validate our approach,
we monitored 27 wrappers over a period of a year. The verification algorithm
correctly discovered 35 of the 37 wrapper changes, and made 16 mistakes,
resulting in precision of 0.73 and recall of 0.95. We validated the reinduction
algorithm on ten Web sources. We were able to successfully reinduce the
wrappers, obtaining precision and recall values of 0.90 and 0.80 on the data
extraction task
Telling Cause from Effect using MDL-based Local and Global Regression
We consider the fundamental problem of inferring the causal direction between
two univariate numeric random variables and from observational data.
The two-variable case is especially difficult to solve since it is not possible
to use standard conditional independence tests between the variables.
To tackle this problem, we follow an information theoretic approach based on
Kolmogorov complexity and use the Minimum Description Length (MDL) principle to
provide a practical solution. In particular, we propose a compression scheme to
encode local and global functional relations using MDL-based regression. We
infer causes in case it is shorter to describe as a function of
than the inverse direction. In addition, we introduce Slope, an efficient
linear-time algorithm that through thorough empirical evaluation on both
synthetic and real world data we show outperforms the state of the art by a
wide margin.Comment: 10 pages, To appear in ICDM1
- …