46 research outputs found
Ambiguous model learning made unambiguous with 1/f priors
What happens to the optimal interpretation of noisy data when there exists
more than one equally plausible interpretation of the data? In a Bayesian
model-learning framework the answer depends on the prior expectations of the
dynamics of the model parameter that is to be inferred from the data. Local
time constraints on the priors are insufficient to pick one interpretation over
another. On the other hand, nonlocal time constraints, induced by a noise
spectrum of the priors, is shown to permit learning of a specific model
parameter even when there are infinitely many equally plausible interpretations
of the data. This transition is inferred by a remarkable mapping of the model
estimation problem to a dissipative physical system, allowing the use of
powerful statistical mechanical methods to uncover the transition from
indeterminate to determinate model learning.Comment: 8 pages, 2 figure
Parametric inference in the large data limit using maximally informative models
Motivated by data-rich experiments in transcriptional regulation and sensory
neuroscience, we consider the following general problem in statistical
inference. When exposed to a high-dimensional signal S, a system of interest
computes a representation R of that signal which is then observed through a
noisy measurement M. From a large number of signals and measurements, we wish
to infer the "filter" that maps S to R. However, the standard method for
solving such problems, likelihood-based inference, requires perfect a priori
knowledge of the "noise function" mapping R to M. In practice such noise
functions are usually known only approximately, if at all, and using an
incorrect noise function will typically bias the inferred filter. Here we show
that, in the large data limit, this need for a pre-characterized noise function
can be circumvented by searching for filters that instead maximize the mutual
information I[M;R] between observed measurements and predicted representations.
Moreover, if the correct filter lies within the space of filters being
explored, maximizing mutual information becomes equivalent to simultaneously
maximizing every dependence measure that satisfies the Data Processing
Inequality. It is important to note that maximizing mutual information will
typically leave a small number of directions in parameter space unconstrained.
We term these directions "diffeomorphic modes" and present an equation that
allows these modes to be derived systematically. The presence of diffeomorphic
modes reflects a fundamental and nontrivial substructure within parameter
space, one that is obscured by standard likelihood-based inference.Comment: To appear in Neural Computatio
Equitability, mutual information, and the maximal information coefficient
Reshef et al. recently proposed a new statistical measure, the "maximal
information coefficient" (MIC), for quantifying arbitrary dependencies between
pairs of stochastic quantities. MIC is based on mutual information, a
fundamental quantity in information theory that is widely understood to serve
this need. MIC, however, is not an estimate of mutual information. Indeed, it
was claimed that MIC possesses a desirable mathematical property called
"equitability" that mutual information lacks. This was not proven; instead it
was argued solely through the analysis of simulated data. Here we show that
this claim, in fact, is incorrect. First we offer mathematical proof that no
(non-trivial) dependence measure satisfies the definition of equitability
proposed by Reshef et al.. We then propose a self-consistent and more general
definition of equitability that follows naturally from the Data Processing
Inequality. Mutual information satisfies this new definition of equitability
while MIC does not. Finally, we show that the simulation evidence offered by
Reshef et al. was artifactual. We conclude that estimating mutual information
is not only practical for many real-world applications, but also provides a
natural solution to the problem of quantifying associations in large data sets
Kerfuffle: a web tool for multi-species gene colocalization analysis
The evolutionary pressures that underlie the large-scale functional
organization of the genome are not well understood in eukaryotes. Recent
evidence suggests that functionally similar genes may colocalize (cluster) in
the eukaryotic genome, suggesting the role of chromatin-level gene regulation
in shaping the physical distribution of coordinated genes. However, few of the
bioinformatic tools currently available allow for a systematic study of gene
colocalization across several, evolutionarily distant species. Kerfuffle is a
web tool designed to help discover, visualize, and quantify the physical
organization of genomes by identifying significant gene colocalization and
conservation across the assembled genomes of available species (currently up to
47, from humans to worms). Kerfuffle only requires the user to specify a list
of human genes and the names of other species of interest. Without further
input from the user, the software queries the e!Ensembl BioMart server to
obtain positional information and discovers homology relations in all genes and
species specified. Using this information, Kerfuffle performs a multi-species
clustering analysis, presents downloadable lists of clustered genes, performs
Monte Carlo statistical significance calculations, estimates how conserved gene
clusters are across species, plots histograms and interactive graphs, allows
users to save their queries, and generates a downloadable visualization of the
clusters using the Circos software. These analyses may be used to further
explore the functional roles of gene clusters by interrogating the enriched
molecular pathways associated with each cluster.Comment: BMC Bioinformatics, In pres
Estimating mutual information and multi--information in large networks
We address the practical problems of estimating the information relations
that characterize large networks. Building on methods developed for analysis of
the neural code, we show that reliable estimates of mutual information can be
obtained with manageable computational effort. The same methods allow
estimation of higher order, multi--information terms. These ideas are
illustrated by analyses of gene expression, financial markets, and consumer
preferences. In each case, information theoretic measures correlate with
independent, intuitive measures of the underlying structures in the system
Developmental Coordination of Gene Expression between Synaptic Partners During GABAergic Circuit Assembly in Cerebellar Cortex
The assembly of neural circuits involves multiple sequential steps such as the specification of cell-types, their migration to proper brain locations, morphological and physiological differentiation, and the formation and maturation of synaptic connections. This intricate and often prolonged process is guided by elaborate genetic mechanisms that regulate each step. Evidence from numerous systems suggests that each cell-type, once specified, is endowed with a genetic program that unfolds in response to, and is regulated by, extrinsic signals, including cellācell and synaptic interactions. To a large extent, the execution of this intrinsic program is achieved by the expression of specific sets of genes that support distinct developmental processes. Therefore, a comprehensive analysis of the developmental progression of gene expression in synaptic partners of neurons may provide a basis for exploring the genetic mechanisms regulating circuit assembly. Here we examined the developmental gene expression profiles of well-defined cell-types in a stereotyped microcircuit of the cerebellar cortex. We found that the transcriptomes of Purkinje cell and stellate/basket cells are highly dynamic throughout postnatal development. We revealed āphasic expressionā of transcription factors, ion channels, receptors, cell adhesion molecules, gap junction proteins, and identified distinct molecular pathways that might contribute to sequential steps of cerebellar inhibitory circuit formation. We further revealed a correlation between genomic clustering and developmental co-expression of hundreds of transcripts, suggesting the involvement of chromatin level gene regulation during circuit formation
Cell non-autonomous interactions during non-immune stromal progression in the breast tumor microenvironment
Summary The breast tumor microenvironment of primary and metastatic sites is a complex milieu of differing cell populations, consisting of tumor cells and the surrounding stroma. Despite recent progress in delineating the immune component of the stroma, the genomic expression landscape of the non-immune stroma (NIS) population and their role in mediating cancer progression and informing effective therapies are not well understood. Here we obtained 52 cell-sorted NIS and epithelial tissue samples across 37 patients from i) normal breast, ii) normal breast adjacent to primary tumor, iii) primary tumor, and iv) metastatic tumor sites. Deep RNA-seq revealed diverging gene expression profiles as the NIS evolves from normal to metastatic tumor tissue, with intra-patient normal-primary variation comparable to inter-patient variation. Significant expression changes between normal and adjacent normal tissue support the notion of a cancer field effect, but extended out to the NIS. Most differentially expressed protein-coding genes and lncRNAs were found to be associated with pattern formation, embryogenesis, and the epithelial-mesenchymal transition. We validated the protein expression changes of a novel candidate gene, C2orf88, by immunohistochemistry staining of representative tissues. Significant mutual information between epithelial ligand and NIS receptor gene expression, across primary and metastatic tissue, suggests a unidirectional model of molecular signaling between the two tissues. Furthermore, survival analyses of 827 luminal breast tumor samples demonstrated the predictive power of the NIS gene expression to inform clinical outcomes. Together, these results highlight the evolution of NIS gene expression in breast tumors and suggest novel therapeutic strategies targeting the microenvironment