Search CORE

36 research outputs found

Facticity as the amount of self-descriptive information in a data set

Author: Adriaans Pieter
Publication venue
Publication date: 01/01/2012
Field of study

Using the theory of Kolmogorov complexity the notion of facticity {\phi}(x) of a string is defined as the amount of self-descriptive information it contains. It is proved that (under reasonable assumptions: the existence of an empty machine and the availability of a faithful index) facticity is definite, i.e. random strings have facticity 0 and for compressible strings 0 < {\phi}(x) < 1/2 |x| + O(1). Consequently facticity measures the tension in a data set between structural and ad-hoc information objectively. For binary strings there is a so-called facticity threshold that is dependent on their entropy. Strings with facticty above this threshold have no optimal stochastic model and are essentially computational. The shape of the facticty versus entropy plot coincides with the well-known sawtooth curves observed in complex systems. The notion of factic processes is discussed. This approach overcomes problems with earlier proposals to use two-part code to define the meaningfulness or usefulness of a data set.Comment: 10 pages, 2 figure

arXiv.org e-Print Archive

UvA-DARE

International Migration, Integration and Social Cohesion online publications

A Critical Analysis of Floridi’s Theory of Semantic Information

Author: A Goldman
EL Gettier
L Floridi
M Heidegger
N Chater
N Chomsky
PD Grünwald
Pieter Adriaans
PW Adriaans
PW Adriaans
R Cilibrasi
R Montague
S Haack
S Lloyd
TM Cover
TM Mitchell
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Overview of BioCreative II gene mention recognition.

Nineteen teams presented results for the Gene Mention Task at the BioCreative II Workshop. In this task participants designed systems to identify substrings in sentences corresponding to gene name mentions. A variety of different methods were used and the results varied with a highest achieved F1 score of 0.8721. Here we present brief descriptions of all the methods used and a statistical analysis of the results. We also demonstrate that, by combining the results from all submissions, an F score of 0.9066 is feasible, and furthermore that the best result makes use of the lowest scoring submissions

epublications@Marquette

Fraunhofer-ePrints

PubMed Central

Edinburgh Research Explorer

Publications at Bielefeld University

Apollo (Cambridge)

White Rose Research Online

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Structuring and extracting knowledge for the support of hypothesis generation in molecular biology

Author: A Gomez-Perez
Andrew P Gibson
B Smith
C Goble
CA Goble
CD Manning
CJ Mungall
DA Moreira
DL Rubin
E Neumann
Edgar Meij
EJ Meij
G Antoniou
I Spasic
IH Witten
J Broekstra
JA Kors
Konstantinos Krommydas
LD Stein
LJ Post
M Ashburner
M Missikoff
M Scott Marshall
M Weeber
MA Inda
Marco Roos
Martijn Schuemie
O Tuason
P Fisher
P Missier
P Romano
Pieter W Adriaans
PJ Verschure
R Hoehndorf
R Jelier
R Stevens
R Witte
S Jupp
S Katrenko
S Katrenko
S Katrenko
Sophia Katrenko
T Clark
Willem Robert van Hage
Publication venue: BioMed Central
Publication date: 01/01/2009
Field of study

Background: Hypothesis generation in molecular and cellular biology is an empirical process in which knowledge derived from prior experiments is distilled into a comprehensible model. The requirement of automated support is exemplified by the difficulty of considering all relevant facts that are contained in the millions of documents available from PubMed. Semantic Web provides tools for sharing prior knowledge, while information retrieval and information extraction techniques enable its extraction from literature. Their combination makes prior knowledge available for computational analysis and inference. While some tools provide complete solutions that limit the control over the modeling and extraction processes, we seek a methodology that supports control by the experimenter over these critical processes. Results: We describe progress towards automated support for the generation of biomolecular hypotheses. Semantic Web technologies are used to structure and store knowledge, while a workflow extracts knowledge from text. We designed minimal proto-ontologies in OWL for capturing different aspects of a text mining experiment: the biological hypothesis, text and documents, text mining, and workflow provenance. The models fit a methodology that allows focus on the requirements of a single experiment while supporting reuse and posterior analysis of extracted knowledge from multiple experiments. Our workflow is composed of services from the 'Adaptive Information Disclosure Application' (AIDA) toolkit as well as a few others. The output is a semantic model with putative biological relations, with each relation linked to the corresponding evidence. Conclusion: We demonstrated a 'do-it-yourself' approach for structuring and extracting knowledge in the context of experimental research on biomolecular mechanisms. The methodology can be used to bootstrap the construction of semantically rich biological models using the results of knowledge extraction processes. Models specific to particular experiments can be constructed that, in turn, link with other semantic models, creating a web of knowledge that spans experiments. Mapping mechanisms can link to other knowledge resources such as OBO ontologies or SKOS vocabularies. AIDA Web Services can be used to design personalized knowledge extraction procedures. In our example experiment, we found three proteins (NF-Kappa B, p21, and Bax) potentially playing a role in the interplay between nutrients and epigenetic gene regulation

Crossref

VU Research Portal

Springer - Publisher Connector

PubMed Central

EUR Research Repository

UvA-DARE

International Migration, Integration and Social Cohesion online publications

Industrial requirements for ML application technology

Author: Pieter Adriaans Syllogic
Pieter W. Adriaans
Publication venue
Publication date
Field of study

this paper, it is clear that we have only a very rudimentary idea of a methodology for ML. On might ask why the ML community has made so little progress in comparison to e.g. the database community. There ara standard methods for the design and implementation of a database. Such things are not yet realised for ML. The main reason may be that a methodology for ML is much harder because there are fundamental philosophical issues on a more abstract level that have not been solved. A general theory of scientific heuristics, that would be a basis for such a methodology is missing. Many issues concerning a methodology for ML are really basic issues in methodology of science in disguise. My general feeling is that it will be difficult to formulate a methodology for ML is the underlying general patterns of scientic heuristics and creativity are not well understood. Recent work in complexity theory, and methodology of science may help us here

CiteSeerX

The Power and Perils of MDL

Author: Paul Vitányi
Pieter Adriaans
Publication venue
Publication date: 01/01/2007
Field of study

We point out a potential weakness in the application of the celebrated Minimum Description Length (MDL) principle for model selection. Specifically, it is shown that (although the index of the model class which actually minimizes a two-part code has many desirable properties) a model which has a shorter twopart code-length than another is not necessarily better (unless of course it achieves the global minimum). This is illustrated by an application to infer a grammar (DFA) from positive examples. We also analyze computability issues, and robustness under recoding of the data. Generally, the classical approach is inadequate to express the goodness-of-fit of individual models for individual data sets. In practice however, this is precisely what we are interested in: both to express the goodness of a procedure and where and how it can fail. To achieve this practical goal, we paradoxically have to use the, supposedly impractical, vehicle of Kolmogorov complexity

CiteSeerX

Crossref

International Migration, Integration and Social Cohesion online publications

Finding Constraints for Semantic Relations via Clustering

Author: Adriaans Pieter
Katrenko Sophia
Publication venue
Publication date: 01/01/2010
Field of study

Automatic recognition of semantic relations constitutes an important part of information extraction. Many existing information extraction systems rely on syntactic information found in a sentence to accomplish this task. In this paper, we look into relation arguments and claim that some semantic relations can be described by constraints imposed on them. This information would provide more insight on the nature of semantic relations and could be further combined with the evidence found in a sentence to arrive at actual extractions

Utrecht University Repository

International Migration, Integration and Social Cohesion online publications