Search CORE

38,379 research outputs found

Substructure Discovery Using Minimum Description Length and Background Knowledge

Author: Cook D. J.
Holder L. B.
Publication venue
Publication date: 01/01/1994
Field of study

The ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our SUBDUE substructure discovery system based on the minimum description length principle. The SUBDUE system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previously-discovered substructures in the data, multiple passes of SUBDUE produce a hierarchical description of the structural regularities in the data. SUBDUE uses a computationally-bounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints. In addition to the minimum description length principle, other background knowledge can be used by SUBDUE to guide the search towards more appropriate substructures. Experiments in a variety of domains demonstrate SUBDUE's ability to find substructures capable of compressing the original data and to discover structural concepts important to the domain. Description of Online Appendix: This is a compressed tar file containing the SUBDUE discovery system, written in C. The program accepts as input databases represented in graph form, and will output discovered substructures with their corresponding value.Comment: See http://www.jair.org/ for an online appendix and other files accompanying this articl

arXiv.org e-Print Archive

CiteSeerX

Evolutionary constraints on the complexity of genetic regulatory networks allow predictions of the total number of genetic interactions

Author: Campos-González Adrian I.
Freyre-González Julio A.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 21/01/2019
Field of study

Genetic regulatory networks (GRNs) have been widely studied, yet there is a lack of understanding with regards to the final size and properties of these networks, mainly due to no network currently being complete. In this study, we analyzed the distribution of GRN structural properties across a large set of distinct prokaryotic organisms and found a set of constrained characteristics such as network density and number of regulators. Our results allowed us to estimate the number of interactions that complete networks would have, a valuable insight that could aid in the daunting task of network curation, prediction, and validation. Using state-of-the-art statistical approaches, we also provided new evidence to settle a previously stated controversy that raised the possibility of complete biological networks being random and therefore attributing the observed scale-free properties to an artifact emerging from the sampling process during network discovery. Furthermore, we identified a set of properties that enabled us to assess the consistency of the connectivity distribution for various GRNs against different alternative statistical distributions. Our results favor the hypothesis that highly connected nodes (hubs) are not a consequence of network incompleteness. Finally, an interaction coverage computed for the GRNs as a proxy for completeness revealed that high-throughput based reconstructions of GRNs could yield biased networks with a low average clustering coefficient, showing that classical targeted discovery of interactions is still needed.Comment: 28 pages, 5 figures, 12 pages supplementary informatio

arXiv.org e-Print Archive

Directory of Open Access Journals

University of Queensland eSpace

Antitrust market definition using statistical learning techniques and consumer characteristics

Author: Willem H. Boshoff
Publication venue
Publication date
Field of study

Market definition is the first step in an antitrust case and relies on empirical evidence of substitution patterns. Cross-price elasticity estimates are preferred evidence for studying substitution patterns, due to advances in IO econometric modelling. However, the data and time requirements of these models weigh against their universal adoption for market definition purposes. These practical constraints â€” and the need for a greater variety of evidence â€” lead practitioners to rely on a larger set of less sophisticated tools for market definition. The paper proposes an addition to the existing toolkit, namely an analysis of consumer characteristics for market definition purposes. The paper shows how cluster analysis can be used to identify meaningful groups of substitutes on the basis of homogeneity of their consumer profiles. Cluster analysis enforces consistency, while recent bootstrap techniques ensure robust conclusions. To illustrate the tool, the paper relies on data from a recently concluded radio merger in South Africa.market definition substitutes media demography clusters bootstrap

Research Papers in Economics

Deconvolving mutational patterns of poliovirus outbreaks reveals its intrinsic fitness landscape.

Author: Barton John P
Chakraborty Arup K
McKay Matthew R
Quadeer Ahmed A
Publication venue: eScholarship, University of California
Publication date: 01/01/2020
Field of study

Vaccination has essentially eradicated poliovirus. Yet, its mutation rate is higher than that of viruses like HIV, for which no effective vaccine exists. To investigate this, we infer a fitness model for the poliovirus viral protein 1 (vp1), which successfully predicts in vitro fitness measurements. This is achieved by first developing a probabilistic model for the prevalence of vp1 sequences that enables us to isolate and remove data that are subject to strong vaccine-derived biases. The intrinsic fitness constraints derived for vp1, a capsid protein subject to antibody responses, are compared with those of analogous HIV proteins. We find that vp1 evolution is subject to tighter constraints, limiting its ability to evade vaccine-induced immune responses. Our analysis also indicates that circulating poliovirus strains in unimmunized populations serve as a reservoir that can seed outbreaks in spatio-temporally localized sub-optimally immunized populations

DSpace@MIT

eScholarship - University of California

University of Melbourne Institutional Repository

Black Hole Demography: From scaling relations to models

Author: Shankar Francesco
Publication venue: 'IOP Publishing'
Publication date: 11/07/2013
Field of study

In this contributed paper I review our current knowledge of the local Black Hole (BH) scaling relations, and their impact on the determination of the local BH mass function. I particularly emphasize the remaining systematic uncertainties impinging upon a secure determination of the BH mass function and how progress can be made. I then review and discuss the evidence for a different time evolution for separate BH-galaxy scaling relations, and how these independent empirical evidences can be reconciled with the overall evolution of the structural properties of the host galaxies. I conclude discussing BH demography in the context of semi-empirical continuity accretion models, as well as more complex evolutionary models, emphasizing the general constraints we can set on them.Comment: 20 pages, 5 figures. Invited article for the focus issue on astrophysical black holes in Classical and Quantum Gravity, guest editors: D.Merritt and L. Rezzoll

arXiv.org e-Print Archive

HAL-INSU

HAL-OBSPM

Hal-Diderot

Resolving Structure in Human Brain Organization: Identifying Mesoscale Organization in Weighted Network Representations

Author: Bassett Danielle S.
Carlson Jean M.
Lim Kelvin O.
Lohse Christian
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 20/12/2013
Field of study

Human brain anatomy and function display a combination of modular and hierarchical organization, suggesting the importance of both cohesive structures and variable resolutions in the facilitation of healthy cognitive processes. However, tools to simultaneously probe these features of brain architecture require further development. We propose and apply a set of methods to extract cohesive structures in network representations of brain connectivity using multi-resolution techniques. We employ a combination of soft thresholding, windowed thresholding, and resolution in community detection, that enable us to identify and isolate structures associated with different weights. One such mesoscale structure is bipartivity, which quantifies the extent to which the brain is divided into two partitions with high connectivity between partitions and low connectivity within partitions. A second, complementary mesoscale structure is modularity, which quantifies the extent to which the brain is divided into multiple communities with strong connectivity within each community and weak connectivity between communities. Our methods lead to multi-resolution curves of these network diagnostics over a range of spatial, geometric, and structural scales. For statistical comparison, we contrast our results with those obtained for several benchmark null models. Our work demonstrates that multi-resolution diagnostic curves capture complex organizational profiles in weighted graphs. We apply these methods to the identification of resolution-specific characteristics of healthy weighted graph architecture and altered connectivity profiles in psychiatric disease.Comment: Comments welcom

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Directory of Open Access Journals

PubMed Central

FigShare

The DEEP2 Galaxy Redshift Survey: The Evolution of Void Statistics from z~1 to z~0

Author: Alison L. Coil
Balian R.
Brian F. Gerke
Charlie Conroy
Cole S.
David C. Koo
Efstathiou G.
El-Ad H.
Elizalde E.
Gaztanaga E.
Jeffrey A. Newman
Kaiser N.
Kauffmann G.
Kauffmann G.
Little B.
Marc Davis
Martin White
Michael C. Cooper
Mo H. J.
Mo H. J.
Renbin Yan
Sheth R. K.
Sheth R. K.
Weinberg D. H.
White S. D. M.
Publication venue: 'University of Chicago Press'
Publication date: 10/08/2005
Field of study

We present measurements of the void probability function (VPF) at z~1 using data from the DEEP2 Redshift Survey and its evolution to z~0 using data from the Sloan Digital Sky Survey (SDSS). We measure the VPF as a function of galaxy color and luminosity in both surveys and find that it mimics trends displayed in the two-point correlation function,

\xi

; namely that samples of brighter, red galaxies have larger voids (i.e. are more strongly clustered) than fainter, blue galaxies. We also clearly detect evolution in the VPF with cosmic time, with voids being larger in comoving units at z~0. We find that the reduced VPF matches the predictions of a `negative binomial' model for galaxies of all colors, luminosities, and redshifts studied. This model lacks a physical motivation, but produces a simple analytic prediction for sources of any number density and integrated two-point correlation function, \bar{\xi}. This implies that differences in the VPF across different galaxy populations are consistent with being due entirely to differences in the population number density and \bar{\xi}. The robust result that all galaxy populations follow the negative binomial model appears to be due to primarily to the clustering of dark matter halos. The reduced VPF is insensitive to changes in the parameters of the halo occupation distribution, in the sense that halo models with the same \bar{\xi} will produce the same VPF. For the wide range of galaxies studied, the VPF therefore does not appear to provide useful constraints on galaxy evolution models that cannot be gleaned from studies of \bar{\xi} alone. (abridged)Comment: 17 pages, 15 figures, ApJ accepte

arXiv.org e-Print Archive

Crossref

CERN Document Server