38,379 research outputs found
Substructure Discovery Using Minimum Description Length and Background Knowledge
The ability to identify interesting and repetitive substructures is an
essential component to discovering knowledge in structural data. We describe a
new version of our SUBDUE substructure discovery system based on the minimum
description length principle. The SUBDUE system discovers substructures that
compress the original data and represent structural concepts in the data. By
replacing previously-discovered substructures in the data, multiple passes of
SUBDUE produce a hierarchical description of the structural regularities in the
data. SUBDUE uses a computationally-bounded inexact graph match that identifies
similar, but not identical, instances of a substructure and finds an
approximate measure of closeness of two substructures when under computational
constraints. In addition to the minimum description length principle, other
background knowledge can be used by SUBDUE to guide the search towards more
appropriate substructures. Experiments in a variety of domains demonstrate
SUBDUE's ability to find substructures capable of compressing the original data
and to discover structural concepts important to the domain. Description of
Online Appendix: This is a compressed tar file containing the SUBDUE discovery
system, written in C. The program accepts as input databases represented in
graph form, and will output discovered substructures with their corresponding
value.Comment: See http://www.jair.org/ for an online appendix and other files
accompanying this articl
Evolutionary constraints on the complexity of genetic regulatory networks allow predictions of the total number of genetic interactions
Genetic regulatory networks (GRNs) have been widely studied, yet there is a
lack of understanding with regards to the final size and properties of these
networks, mainly due to no network currently being complete. In this study, we
analyzed the distribution of GRN structural properties across a large set of
distinct prokaryotic organisms and found a set of constrained characteristics
such as network density and number of regulators. Our results allowed us to
estimate the number of interactions that complete networks would have, a
valuable insight that could aid in the daunting task of network curation,
prediction, and validation. Using state-of-the-art statistical approaches, we
also provided new evidence to settle a previously stated controversy that
raised the possibility of complete biological networks being random and
therefore attributing the observed scale-free properties to an artifact
emerging from the sampling process during network discovery. Furthermore, we
identified a set of properties that enabled us to assess the consistency of the
connectivity distribution for various GRNs against different alternative
statistical distributions. Our results favor the hypothesis that highly
connected nodes (hubs) are not a consequence of network incompleteness.
Finally, an interaction coverage computed for the GRNs as a proxy for
completeness revealed that high-throughput based reconstructions of GRNs could
yield biased networks with a low average clustering coefficient, showing that
classical targeted discovery of interactions is still needed.Comment: 28 pages, 5 figures, 12 pages supplementary informatio
Antitrust market definition using statistical learning techniques and consumer characteristics
Market definition is the first step in an antitrust case and relies on empirical evidence of substitution patterns. Cross-price elasticity estimates are preferred evidence for studying substitution patterns, due to advances in IO econometric modelling. However, the data and time requirements of these models weigh against their universal adoption for market definition purposes. These practical constraints Ć¢ā¬ā and the need for a greater variety of evidence Ć¢ā¬ā lead practitioners to rely on a larger set of less sophisticated tools for market definition. The paper proposes an addition to the existing toolkit, namely an analysis of consumer characteristics for market definition purposes. The paper shows how cluster analysis can be used to identify meaningful groups of substitutes on the basis of homogeneity of their consumer profiles. Cluster analysis enforces consistency, while recent bootstrap techniques ensure robust conclusions. To illustrate the tool, the paper relies on data from a recently concluded radio merger in South Africa.market definition substitutes media demography clusters bootstrap
Deconvolving mutational patterns of poliovirus outbreaks reveals its intrinsic fitness landscape.
Vaccination has essentially eradicated poliovirus. Yet, its mutation rate is higher than that of viruses like HIV, for which no effective vaccine exists. To investigate this, we infer a fitness model for the poliovirus viral protein 1 (vp1), which successfully predicts in vitro fitness measurements. This is achieved by first developing a probabilistic model for the prevalence of vp1 sequences that enables us to isolate and remove data that are subject to strong vaccine-derived biases. The intrinsic fitness constraints derived for vp1, a capsid protein subject to antibody responses, are compared with those of analogous HIV proteins. We find that vp1 evolution is subject to tighter constraints, limiting its ability to evade vaccine-induced immune responses. Our analysis also indicates that circulating poliovirus strains in unimmunized populations serve as a reservoir that can seed outbreaks in spatio-temporally localized sub-optimally immunized populations
Black Hole Demography: From scaling relations to models
In this contributed paper I review our current knowledge of the local Black
Hole (BH) scaling relations, and their impact on the determination of the local
BH mass function. I particularly emphasize the remaining systematic
uncertainties impinging upon a secure determination of the BH mass function and
how progress can be made. I then review and discuss the evidence for a
different time evolution for separate BH-galaxy scaling relations, and how
these independent empirical evidences can be reconciled with the overall
evolution of the structural properties of the host galaxies. I conclude
discussing BH demography in the context of semi-empirical continuity accretion
models, as well as more complex evolutionary models, emphasizing the general
constraints we can set on them.Comment: 20 pages, 5 figures. Invited article for the focus issue on
astrophysical black holes in Classical and Quantum Gravity, guest editors:
D.Merritt and L. Rezzoll
Resolving Structure in Human Brain Organization: Identifying Mesoscale Organization in Weighted Network Representations
Human brain anatomy and function display a combination of modular and
hierarchical organization, suggesting the importance of both cohesive
structures and variable resolutions in the facilitation of healthy cognitive
processes. However, tools to simultaneously probe these features of brain
architecture require further development. We propose and apply a set of methods
to extract cohesive structures in network representations of brain connectivity
using multi-resolution techniques. We employ a combination of soft
thresholding, windowed thresholding, and resolution in community detection,
that enable us to identify and isolate structures associated with different
weights. One such mesoscale structure is bipartivity, which quantifies the
extent to which the brain is divided into two partitions with high connectivity
between partitions and low connectivity within partitions. A second,
complementary mesoscale structure is modularity, which quantifies the extent to
which the brain is divided into multiple communities with strong connectivity
within each community and weak connectivity between communities. Our methods
lead to multi-resolution curves of these network diagnostics over a range of
spatial, geometric, and structural scales. For statistical comparison, we
contrast our results with those obtained for several benchmark null models. Our
work demonstrates that multi-resolution diagnostic curves capture complex
organizational profiles in weighted graphs. We apply these methods to the
identification of resolution-specific characteristics of healthy weighted graph
architecture and altered connectivity profiles in psychiatric disease.Comment: Comments welcom
The DEEP2 Galaxy Redshift Survey: The Evolution of Void Statistics from z~1 to z~0
We present measurements of the void probability function (VPF) at z~1 using
data from the DEEP2 Redshift Survey and its evolution to z~0 using data from
the Sloan Digital Sky Survey (SDSS). We measure the VPF as a function of galaxy
color and luminosity in both surveys and find that it mimics trends displayed
in the two-point correlation function, ; namely that samples of brighter,
red galaxies have larger voids (i.e. are more strongly clustered) than fainter,
blue galaxies. We also clearly detect evolution in the VPF with cosmic time,
with voids being larger in comoving units at z~0. We find that the reduced VPF
matches the predictions of a `negative binomial' model for galaxies of all
colors, luminosities, and redshifts studied. This model lacks a physical
motivation, but produces a simple analytic prediction for sources of any number
density and integrated two-point correlation function, \bar{\xi}. This implies
that differences in the VPF across different galaxy populations are consistent
with being due entirely to differences in the population number density and
\bar{\xi}. The robust result that all galaxy populations follow the negative
binomial model appears to be due to primarily to the clustering of dark matter
halos. The reduced VPF is insensitive to changes in the parameters of the halo
occupation distribution, in the sense that halo models with the same \bar{\xi}
will produce the same VPF. For the wide range of galaxies studied, the VPF
therefore does not appear to provide useful constraints on galaxy evolution
models that cannot be gleaned from studies of \bar{\xi} alone. (abridged)Comment: 17 pages, 15 figures, ApJ accepte
- ā¦