386,238 research outputs found
Explicit probabilistic models for databases and networks
Recent work in data mining and related areas has highlighted the importance
of the statistical assessment of data mining results. Crucial to this endeavour
is the choice of a non-trivial null model for the data, to which the found
patterns can be contrasted. The most influential null models proposed so far
are defined in terms of invariants of the null distribution. Such null models
can be used by computation intensive randomization approaches in estimating the
statistical significance of data mining results.
Here, we introduce a methodology to construct non-trivial probabilistic
models based on the maximum entropy (MaxEnt) principle. We show how MaxEnt
models allow for the natural incorporation of prior information. Furthermore,
they satisfy a number of desirable properties of previously introduced
randomization approaches. Lastly, they also have the benefit that they can be
represented explicitly. We argue that our approach can be used for a variety of
data types. However, for concreteness, we have chosen to demonstrate it in
particular for databases and networks.Comment: Submitte
Chess Endgames and Neural Networks
The existence of endgame databases challenges us to extract higher-grade information and knowledge from their basic data content. Chess players, for example, would like simple and usable endgame theories if such holy grail exists: endgame experts would like to provide such insights and be inspired by computers to do so. Here, we investigate the use of artificial neural networks (NNs) to mine these databases and we report on a first use of NNs on KPK. The results encourage us to suggest further work on chess applications of neural networks and other data-mining techniques
Implementing Database Coordination in P2P Networks
We are interested in the interaction of databases in Peer-to-Peer (P2P) networks. In this paper we propose a new solution for P2P databases, that we call database coordination. We see coordination as managing semantic interdependencies among databases at runtime. We propose a data coordination model where the notions of Interest Groups and Acquaintances play the most crucial role. Interest groups support the formation of peers according to data models they have in common; and acquaintances allow for peers inter-operation. Finally, we present an architecture supporting database coordination and show how it is implemented on top of JXTA
Quantifying the consistency of scientific databases
Science is a social process with far-reaching impact on our modern society.
In the recent years, for the first time we are able to scientifically study the
science itself. This is enabled by massive amounts of data on scientific
publications that is increasingly becoming available. The data is contained in
several databases such as Web of Science or PubMed, maintained by various
public and private entities. Unfortunately, these databases are not always
consistent, which considerably hinders this study. Relying on the powerful
framework of complex networks, we conduct a systematic analysis of the
consistency among six major scientific databases. We found that identifying a
single "best" database is far from easy. Nevertheless, our results indicate
appreciable differences in mutual consistency of different databases, which we
interpret as recipes for future bibliometric studies.Comment: 20 pages, 5 figures, 4 table
- …