3,106 research outputs found
Describing the complexity of systems: multi-variable "set complexity" and the information basis of systems biology
Context dependence is central to the description of complexity. Keying on the
pairwise definition of "set complexity" we use an information theory approach
to formulate general measures of systems complexity. We examine the properties
of multi-variable dependency starting with the concept of interaction
information. We then present a new measure for unbiased detection of
multi-variable dependency, "differential interaction information." This
quantity for two variables reduces to the pairwise "set complexity" previously
proposed as a context-dependent measure of information in biological systems.
We generalize it here to an arbitrary number of variables. Critical limiting
properties of the "differential interaction information" are key to the
generalization. This measure extends previous ideas about biological
information and provides a more sophisticated basis for study of complexity.
The properties of "differential interaction information" also suggest new
approaches to data analysis. Given a data set of system measurements
differential interaction information can provide a measure of collective
dependence, which can be represented in hypergraphs describing complex system
interaction patterns. We investigate this kind of analysis using simulated data
sets. The conjoining of a generalized set complexity measure, multi-variable
dependency analysis, and hypergraphs is our central result. While our focus is
on complex biological systems, our results are applicable to any complex
system.Comment: 44 pages, 12 figures; made revisions after peer revie
Normalized Web Distance and Word Similarity
There is a great deal of work in cognitive psychology, linguistics, and
computer science, about using word (or phrase) frequencies in context in text
corpora to develop measures for word similarity or word association, going back
to at least the 1960s. The goal of this chapter is to introduce the
normalizedis a general way to tap the amorphous low-grade knowledge available
for free on the Internet, typed in by local users aiming at personal
gratification of diverse objectives, and yet globally achieving what is
effectively the largest semantic electronic database in the world. Moreover,
this database is available for all by using any search engine that can return
aggregate page-count estimates for a large range of search-queries. In the
paper introducing the NWD it was called `normalized Google distance (NGD),' but
since Google doesn't allow computer searches anymore, we opt for the more
neutral and descriptive NWD. web distance (NWD) method to determine similarity
between words and phrases. ItComment: Latex, 20 pages, 7 figures, to appear in: Handbook of Natural
Language Processing, Second Edition, Nitin Indurkhya and Fred J. Damerau
Eds., CRC Press, Taylor and Francis Group, Boca Raton, FL, 2010, ISBN
978-142008592
Exploiting semantic annotations for open information extraction: an experience in the biomedical domain
The increasing amount of unstructured text published on the Web is demanding new tools and methods to automatically process and extract relevant information. Traditional information extraction has focused on harvesting domain-specific, pre-specified relations, which usually requires manual labor and heavy machinery; especially in the biomedical domain, the main efforts have been directed toward the recognition of well-defined entities such as genes or proteins, which constitutes the basis for extracting the relationships between the recognized entities. The intrinsic features and scale of the Web demand new approaches able to cope with the diversity of documents, where the number of relations is unbounded and not known in advance. This paper presents a scalable method for the extraction of domain-independent relations from text that exploits the knowledge in the semantic annotations. The method is not geared to any specific domain (e.g., protein–protein interactions and drug–drug interactions) and does not require any manual input or deep processing. Moreover, the method uses the extracted relations to compute groups of abstract semantic relations characterized by their signature types and synonymous relation strings. This constitutes a valuable source of knowledge when constructing formal knowledge bases, as we enable seamless integration of the extracted relations with the available knowledge resources through the process of semantic annotation. The proposed approach has successfully been applied to a large text collection in the biomedical domain and the results are very encouraging.The work was supported by the CICYT project TIN2011-24147 from the Spanish Ministry of Economy and Competitiveness (MINECO)
Tight Bounds for Communication-Assisted Agreement Distillation
Suppose Alice holds a uniformly random string X in {0,1}^N and Bob holds a noisy version Y of X where each bit of X is flipped independently with probability epsilon in [0,1/2]. Alice and Bob would like to extract a common random string of min-entropy at least k. In this work, we establish the communication versus success probability trade-off for this problem by giving a protocol and a matching lower bound (under the restriction that the string to be agreed upon is determined by Alice\u27s input X). Specifically, we prove that in order for Alice and Bob to agree on a common string with probability 2^{-gamma k} (gamma k >= 1), the optimal communication (up to o(k) terms, and achievable for large N) is precisely (C *(1-gamma) - 2 * sqrt{ C * (1-C) gamma}) * k, where C := 4 * epsilon * (1-epsilon). In particular, the optimal communication to achieve Omega(1) agreement probability approaches 4 * epsilon * (1-epsilon) * k.
We also consider the case when Y is the output of the binary erasure channel on X, where each bit of Y equals the corresponding bit of X with probability 1-epsilon and is otherwise erased (that is, replaced by a "?"). In this case, the communication required becomes (epsilon * (1-gamma) - 2 * sqrt{ epsilon * (1-epsilon) * gamma}) * k. In particular, the optimal communication to achieve Omega(1) agreement probability approaches epsilon * k, and with no communication the optimal agreement probability approaches 2^{- (1-sqrt{1-epsilon})/(1+sqrt{1-epsilon}) * k}.
Our protocols are based on covering codes and extend the approach of (Bogdanov and Mossel, 2011) for the zero-communication case. Our lower bounds rely on hypercontractive inequalities. For the model of bit-flips, our argument extends the approach of (Bogdanov and Mossel, 2011) by allowing communication; for the erasure model, to the best of our knowledge the needed hypercontractivity statement was not studied before, and it was established (given our application) by (Nair and Wang 2015). We also obtain information complexity lower bounds for these tasks, and together with our protocol, they shed light on the recently popular "most informative Boolean function" conjecture of Courtade and Kumar
Effective Field Theory Methods in Gravitational Physics and Tests of Gravity
In this PhD thesis I make use of the "Effective Field Theory of Gravity for
Extended Objects" by Goldberger and Rothstein in order to investigate theories
of gravity and to take a different point of view on the physical information
that can be extracted from experiments. In the first work I present, I study a
scalar-tensor theory of gravity and I address the renormalization of the
energy-momentum tensor for point-like and string-like sources. The second and
third study I report are set in the context of testing gravity. So far
experiments have probed dynamical regimes only up to order (v/c)^5 in the
post-Newtonian expansion, which corresponds to the very first term of the
radiative sector in General Relativity. In contrast, by means of
gravitational-wave astronomy, one aims at testing General Relativity up to
(v/c)^(12)! It is then relevant to envisage testing frameworks which are
appropriate to this strong-field/radiative regime. In the last two chapters of
this thesis a new such framework is presented. Using the effective field theory
approach, General Relativity non-linearities are described by Feynman diagrams
in which classical gravitons interact with matter sources and among themselves.
Tagging the self-interaction vertices of gravitons with parameters it is
possible, for example, to translate the measure of the period decay of
Hulse-Taylor pulsar in a constraint on the three-graviton vertex at the 0.1%
level; for comparison, LEP constraints on the triple-gauge-boson couplings of
weak interactions are accurate at 3%. With future observations of gravitational
waves, higher order graviton vertices can in principle be constrained through a
Fisher matrix analysis.Comment: This PhD Thesis has been conducted at the University of Geneva
(Switzerland) under the direction of Professor Michele Maggiore and the
codirection of Doctor Riccardo Sturani. Version 2: abstract slightly changed;
one typo corrected; layout issue fixe
- …