38,379 research outputs found

    Substructure Discovery Using Minimum Description Length and Background Knowledge

    Full text link
    The ability to identify interesting and repetitive substructures is an essential component to discovering knowledge in structural data. We describe a new version of our SUBDUE substructure discovery system based on the minimum description length principle. The SUBDUE system discovers substructures that compress the original data and represent structural concepts in the data. By replacing previously-discovered substructures in the data, multiple passes of SUBDUE produce a hierarchical description of the structural regularities in the data. SUBDUE uses a computationally-bounded inexact graph match that identifies similar, but not identical, instances of a substructure and finds an approximate measure of closeness of two substructures when under computational constraints. In addition to the minimum description length principle, other background knowledge can be used by SUBDUE to guide the search towards more appropriate substructures. Experiments in a variety of domains demonstrate SUBDUE's ability to find substructures capable of compressing the original data and to discover structural concepts important to the domain. Description of Online Appendix: This is a compressed tar file containing the SUBDUE discovery system, written in C. The program accepts as input databases represented in graph form, and will output discovered substructures with their corresponding value.Comment: See http://www.jair.org/ for an online appendix and other files accompanying this articl

    Evolutionary constraints on the complexity of genetic regulatory networks allow predictions of the total number of genetic interactions

    Full text link
    Genetic regulatory networks (GRNs) have been widely studied, yet there is a lack of understanding with regards to the final size and properties of these networks, mainly due to no network currently being complete. In this study, we analyzed the distribution of GRN structural properties across a large set of distinct prokaryotic organisms and found a set of constrained characteristics such as network density and number of regulators. Our results allowed us to estimate the number of interactions that complete networks would have, a valuable insight that could aid in the daunting task of network curation, prediction, and validation. Using state-of-the-art statistical approaches, we also provided new evidence to settle a previously stated controversy that raised the possibility of complete biological networks being random and therefore attributing the observed scale-free properties to an artifact emerging from the sampling process during network discovery. Furthermore, we identified a set of properties that enabled us to assess the consistency of the connectivity distribution for various GRNs against different alternative statistical distributions. Our results favor the hypothesis that highly connected nodes (hubs) are not a consequence of network incompleteness. Finally, an interaction coverage computed for the GRNs as a proxy for completeness revealed that high-throughput based reconstructions of GRNs could yield biased networks with a low average clustering coefficient, showing that classical targeted discovery of interactions is still needed.Comment: 28 pages, 5 figures, 12 pages supplementary informatio

    Antitrust market definition using statistical learning techniques and consumer characteristics

    Get PDF
    Market definition is the first step in an antitrust case and relies on empirical evidence of substitution patterns. Cross-price elasticity estimates are preferred evidence for studying substitution patterns, due to advances in IO econometric modelling. However, the data and time requirements of these models weigh against their universal adoption for market definition purposes. These practical constraints Ć¢ā‚¬ā€ and the need for a greater variety of evidence Ć¢ā‚¬ā€ lead practitioners to rely on a larger set of less sophisticated tools for market definition. The paper proposes an addition to the existing toolkit, namely an analysis of consumer characteristics for market definition purposes. The paper shows how cluster analysis can be used to identify meaningful groups of substitutes on the basis of homogeneity of their consumer profiles. Cluster analysis enforces consistency, while recent bootstrap techniques ensure robust conclusions. To illustrate the tool, the paper relies on data from a recently concluded radio merger in South Africa.market definition substitutes media demography clusters bootstrap

    Deconvolving mutational patterns of poliovirus outbreaks reveals its intrinsic fitness landscape.

    Get PDF
    Vaccination has essentially eradicated poliovirus. Yet, its mutation rate is higher than that of viruses like HIV, for which no effective vaccine exists. To investigate this, we infer a fitness model for the poliovirus viral protein 1 (vp1), which successfully predicts in vitro fitness measurements. This is achieved by first developing a probabilistic model for the prevalence of vp1 sequences that enables us to isolate and remove data that are subject to strong vaccine-derived biases. The intrinsic fitness constraints derived for vp1, a capsid protein subject to antibody responses, are compared with those of analogous HIV proteins. We find that vp1 evolution is subject to tighter constraints, limiting its ability to evade vaccine-induced immune responses. Our analysis also indicates that circulating poliovirus strains in unimmunized populations serve as a reservoir that can seed outbreaks in spatio-temporally localized sub-optimally immunized populations

    Black Hole Demography: From scaling relations to models

    Full text link
    In this contributed paper I review our current knowledge of the local Black Hole (BH) scaling relations, and their impact on the determination of the local BH mass function. I particularly emphasize the remaining systematic uncertainties impinging upon a secure determination of the BH mass function and how progress can be made. I then review and discuss the evidence for a different time evolution for separate BH-galaxy scaling relations, and how these independent empirical evidences can be reconciled with the overall evolution of the structural properties of the host galaxies. I conclude discussing BH demography in the context of semi-empirical continuity accretion models, as well as more complex evolutionary models, emphasizing the general constraints we can set on them.Comment: 20 pages, 5 figures. Invited article for the focus issue on astrophysical black holes in Classical and Quantum Gravity, guest editors: D.Merritt and L. Rezzoll

    Resolving Structure in Human Brain Organization: Identifying Mesoscale Organization in Weighted Network Representations

    Full text link
    Human brain anatomy and function display a combination of modular and hierarchical organization, suggesting the importance of both cohesive structures and variable resolutions in the facilitation of healthy cognitive processes. However, tools to simultaneously probe these features of brain architecture require further development. We propose and apply a set of methods to extract cohesive structures in network representations of brain connectivity using multi-resolution techniques. We employ a combination of soft thresholding, windowed thresholding, and resolution in community detection, that enable us to identify and isolate structures associated with different weights. One such mesoscale structure is bipartivity, which quantifies the extent to which the brain is divided into two partitions with high connectivity between partitions and low connectivity within partitions. A second, complementary mesoscale structure is modularity, which quantifies the extent to which the brain is divided into multiple communities with strong connectivity within each community and weak connectivity between communities. Our methods lead to multi-resolution curves of these network diagnostics over a range of spatial, geometric, and structural scales. For statistical comparison, we contrast our results with those obtained for several benchmark null models. Our work demonstrates that multi-resolution diagnostic curves capture complex organizational profiles in weighted graphs. We apply these methods to the identification of resolution-specific characteristics of healthy weighted graph architecture and altered connectivity profiles in psychiatric disease.Comment: Comments welcom

    The DEEP2 Galaxy Redshift Survey: The Evolution of Void Statistics from z~1 to z~0

    Full text link
    We present measurements of the void probability function (VPF) at z~1 using data from the DEEP2 Redshift Survey and its evolution to z~0 using data from the Sloan Digital Sky Survey (SDSS). We measure the VPF as a function of galaxy color and luminosity in both surveys and find that it mimics trends displayed in the two-point correlation function, Ī¾\xi; namely that samples of brighter, red galaxies have larger voids (i.e. are more strongly clustered) than fainter, blue galaxies. We also clearly detect evolution in the VPF with cosmic time, with voids being larger in comoving units at z~0. We find that the reduced VPF matches the predictions of a `negative binomial' model for galaxies of all colors, luminosities, and redshifts studied. This model lacks a physical motivation, but produces a simple analytic prediction for sources of any number density and integrated two-point correlation function, \bar{\xi}. This implies that differences in the VPF across different galaxy populations are consistent with being due entirely to differences in the population number density and \bar{\xi}. The robust result that all galaxy populations follow the negative binomial model appears to be due to primarily to the clustering of dark matter halos. The reduced VPF is insensitive to changes in the parameters of the halo occupation distribution, in the sense that halo models with the same \bar{\xi} will produce the same VPF. For the wide range of galaxies studied, the VPF therefore does not appear to provide useful constraints on galaxy evolution models that cannot be gleaned from studies of \bar{\xi} alone. (abridged)Comment: 17 pages, 15 figures, ApJ accepte
    • ā€¦
    corecore