Search CORE

3,106 research outputs found

Describing the complexity of systems: multi-variable "set complexity" and the information basis of systems biology

Author: Galas David J.
Ignac Tomasz
Sakhanenko Nikita A.
Skupin Alexander
Publication venue
Publication date: 19/08/2013
Field of study

Context dependence is central to the description of complexity. Keying on the pairwise definition of "set complexity" we use an information theory approach to formulate general measures of systems complexity. We examine the properties of multi-variable dependency starting with the concept of interaction information. We then present a new measure for unbiased detection of multi-variable dependency, "differential interaction information." This quantity for two variables reduces to the pairwise "set complexity" previously proposed as a context-dependent measure of information in biological systems. We generalize it here to an arbitrary number of variables. Critical limiting properties of the "differential interaction information" are key to the generalization. This measure extends previous ideas about biological information and provides a more sophisticated basis for study of complexity. The properties of "differential interaction information" also suggest new approaches to data analysis. Given a data set of system measurements differential interaction information can provide a measure of collective dependence, which can be represented in hypergraphs describing complex system interaction patterns. We investigate this kind of analysis using simulated data sets. The conjoining of a generalized set complexity measure, multi-variable dependency analysis, and hypergraphs is our central result. While our focus is on complex biological systems, our results are applicable to any complex system.Comment: 44 pages, 12 figures; made revisions after peer revie

arXiv.org e-Print Archive

Crossref

PubMed Central

Open Repository and Bibliography - Luxembourg

Normalized Web Distance and Word Similarity

Author: Cilibrasi Rudi L.
Vitanyi Paul M. B.
Publication venue
Publication date: 01/01/2009
Field of study

There is a great deal of work in cognitive psychology, linguistics, and computer science, about using word (or phrase) frequencies in context in text corpora to develop measures for word similarity or word association, going back to at least the 1960s. The goal of this chapter is to introduce the normalizedis a general way to tap the amorphous low-grade knowledge available for free on the Internet, typed in by local users aiming at personal gratification of diverse objectives, and yet globally achieving what is effectively the largest semantic electronic database in the world. Moreover, this database is available for all by using any search engine that can return aggregate page-count estimates for a large range of search-queries. In the paper introducing the NWD it was called `normalized Google distance (NGD),' but since Google doesn't allow computer searches anymore, we opt for the more neutral and descriptive NWD. web distance (NWD) method to determine similarity between words and phrases. ItComment: Latex, 20 pages, 7 figures, to appear in: Handbook of Natural Language Processing, Second Edition, Nitin Indurkhya and Fred J. Damerau Eds., CRC Press, Taylor and Francis Group, Boca Raton, FL, 2010, ISBN 978-142008592

arXiv.org e-Print Archive

CiteSeerX

CWI's Institutional Repository

International Migration, Integration and Social Cohesion online publications

Exploiting semantic annotations for open information extraction: an experience in the biomedical domain

Author: Berlanga Llavori Rafael
Nebot Romero Victoria
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

The increasing amount of unstructured text published on the Web is demanding new tools and methods to automatically process and extract relevant information. Traditional information extraction has focused on harvesting domain-specific, pre-specified relations, which usually requires manual labor and heavy machinery; especially in the biomedical domain, the main efforts have been directed toward the recognition of well-defined entities such as genes or proteins, which constitutes the basis for extracting the relationships between the recognized entities. The intrinsic features and scale of the Web demand new approaches able to cope with the diversity of documents, where the number of relations is unbounded and not known in advance. This paper presents a scalable method for the extraction of domain-independent relations from text that exploits the knowledge in the semantic annotations. The method is not geared to any specific domain (e.g., protein–protein interactions and drug–drug interactions) and does not require any manual input or deep processing. Moreover, the method uses the extracted relations to compute groups of abstract semantic relations characterized by their signature types and synonymous relation strings. This constitutes a valuable source of knowledge when constructing formal knowledge bases, as we enable seamless integration of the extracted relations with the available knowledge resources through the process of semantic annotation. The proposed approach has successfully been applied to a large text collection in the biomedical domain and the results are very encouraging.The work was supported by the CICYT project TIN2011-24147 from the Spanish Ministry of Economy and Competitiveness (MINECO)

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Repositori Institucional de la Universitat Jaume I

Tight Bounds for Communication-Assisted Agreement Distillation

Author: Guruswami Venkatesan
Radhakrishnan Jaikumar
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 31st Conference on Computational Complexity (CCC 2016)
Publication date: 01/01/2016
Field of study

Suppose Alice holds a uniformly random string X in {0,1}^N and Bob holds a noisy version Y of X where each bit of X is flipped independently with probability epsilon in [0,1/2]. Alice and Bob would like to extract a common random string of min-entropy at least k. In this work, we establish the communication versus success probability trade-off for this problem by giving a protocol and a matching lower bound (under the restriction that the string to be agreed upon is determined by Alice\u27s input X). Specifically, we prove that in order for Alice and Bob to agree on a common string with probability 2^{-gamma k} (gamma k >= 1), the optimal communication (up to o(k) terms, and achievable for large N) is precisely (C *(1-gamma) - 2 * sqrt{ C * (1-C) gamma}) * k, where C := 4 * epsilon * (1-epsilon). In particular, the optimal communication to achieve Omega(1) agreement probability approaches 4 * epsilon * (1-epsilon) * k. We also consider the case when Y is the output of the binary erasure channel on X, where each bit of Y equals the corresponding bit of X with probability 1-epsilon and is otherwise erased (that is, replaced by a "?"). In this case, the communication required becomes (epsilon * (1-gamma) - 2 * sqrt{ epsilon * (1-epsilon) * gamma}) * k. In particular, the optimal communication to achieve Omega(1) agreement probability approaches epsilon * k, and with no communication the optimal agreement probability approaches 2^{- (1-sqrt{1-epsilon})/(1+sqrt{1-epsilon}) * k}. Our protocols are based on covering codes and extend the approach of (Bogdanov and Mossel, 2011) for the zero-communication case. Our lower bounds rely on hypercontractive inequalities. For the model of bit-flips, our argument extends the approach of (Bogdanov and Mossel, 2011) by allowing communication; for the erasure model, to the best of our knowledge the needed hypercontractivity statement was not studied before, and it was established (given our application) by (Nair and Wang 2015). We also obtain information complexity lower bounds for these tasks, and together with our protocol, they shed light on the recently popular "most informative Boolean function" conjecture of Courtade and Kumar

Dagstuhl Research Online Publication Server

Integrating Word Embedding Offsets into the Espresso System for Part-Whole Relation Extraction

Author: Matsumoto Yuji
Phi Van-Thuy
Publication venue: Hankookmunhwasa
Publication date: 01/01/2016
Field of study

Waseda University Repository

Effective Field Theory Methods in Gravitational Physics and Tests of Gravity

Author: Cannella Umberto
Publication venue
Publication date: 01/01/2011
Field of study

In this PhD thesis I make use of the "Effective Field Theory of Gravity for Extended Objects" by Goldberger and Rothstein in order to investigate theories of gravity and to take a different point of view on the physical information that can be extracted from experiments. In the first work I present, I study a scalar-tensor theory of gravity and I address the renormalization of the energy-momentum tensor for point-like and string-like sources. The second and third study I report are set in the context of testing gravity. So far experiments have probed dynamical regimes only up to order (v/c)^5 in the post-Newtonian expansion, which corresponds to the very first term of the radiative sector in General Relativity. In contrast, by means of gravitational-wave astronomy, one aims at testing General Relativity up to (v/c)^(12)! It is then relevant to envisage testing frameworks which are appropriate to this strong-field/radiative regime. In the last two chapters of this thesis a new such framework is presented. Using the effective field theory approach, General Relativity non-linearities are described by Feynman diagrams in which classical gravitons interact with matter sources and among themselves. Tagging the self-interaction vertices of gravitons with parameters it is possible, for example, to translate the measure of the period decay of Hulse-Taylor pulsar in a constraint on the three-graviton vertex at the 0.1% level; for comparison, LEP constraints on the triple-gauge-boson couplings of weak interactions are accurate at 3%. With future observations of gravitational waves, higher order graviton vertices can in principle be constrained through a Fisher matrix analysis.Comment: This PhD Thesis has been conducted at the University of Geneva (Switzerland) under the direction of Professor Michele Maggiore and the codirection of Doctor Riccardo Sturani. Version 2: abstract slightly changed; one typo corrected; layout issue fixe

arXiv.org e-Print Archive

CERN Document Server

Archive ouverte UNIGE