23 research outputs found

    Parametric inference in the large data limit using maximally informative models

    Get PDF
    Motivated by data-rich experiments in transcriptional regulation and sensory neuroscience, we consider the following general problem in statistical inference. When exposed to a high-dimensional signal S, a system of interest computes a representation R of that signal which is then observed through a noisy measurement M. From a large number of signals and measurements, we wish to infer the "filter" that maps S to R. However, the standard method for solving such problems, likelihood-based inference, requires perfect a priori knowledge of the "noise function" mapping R to M. In practice such noise functions are usually known only approximately, if at all, and using an incorrect noise function will typically bias the inferred filter. Here we show that, in the large data limit, this need for a pre-characterized noise function can be circumvented by searching for filters that instead maximize the mutual information I[M;R] between observed measurements and predicted representations. Moreover, if the correct filter lies within the space of filters being explored, maximizing mutual information becomes equivalent to simultaneously maximizing every dependence measure that satisfies the Data Processing Inequality. It is important to note that maximizing mutual information will typically leave a small number of directions in parameter space unconstrained. We term these directions "diffeomorphic modes" and present an equation that allows these modes to be derived systematically. The presence of diffeomorphic modes reflects a fundamental and nontrivial substructure within parameter space, one that is obscured by standard likelihood-based inference.Comment: To appear in Neural Computatio

    Equitability, mutual information, and the maximal information coefficient

    Get PDF
    Reshef et al. recently proposed a new statistical measure, the "maximal information coefficient" (MIC), for quantifying arbitrary dependencies between pairs of stochastic quantities. MIC is based on mutual information, a fundamental quantity in information theory that is widely understood to serve this need. MIC, however, is not an estimate of mutual information. Indeed, it was claimed that MIC possesses a desirable mathematical property called "equitability" that mutual information lacks. This was not proven; instead it was argued solely through the analysis of simulated data. Here we show that this claim, in fact, is incorrect. First we offer mathematical proof that no (non-trivial) dependence measure satisfies the definition of equitability proposed by Reshef et al.. We then propose a self-consistent and more general definition of equitability that follows naturally from the Data Processing Inequality. Mutual information satisfies this new definition of equitability while MIC does not. Finally, we show that the simulation evidence offered by Reshef et al. was artifactual. We conclude that estimating mutual information is not only practical for many real-world applications, but also provides a natural solution to the problem of quantifying associations in large data sets

    Kerfuffle: a web tool for multi-species gene colocalization analysis

    Get PDF
    The evolutionary pressures that underlie the large-scale functional organization of the genome are not well understood in eukaryotes. Recent evidence suggests that functionally similar genes may colocalize (cluster) in the eukaryotic genome, suggesting the role of chromatin-level gene regulation in shaping the physical distribution of coordinated genes. However, few of the bioinformatic tools currently available allow for a systematic study of gene colocalization across several, evolutionarily distant species. Kerfuffle is a web tool designed to help discover, visualize, and quantify the physical organization of genomes by identifying significant gene colocalization and conservation across the assembled genomes of available species (currently up to 47, from humans to worms). Kerfuffle only requires the user to specify a list of human genes and the names of other species of interest. Without further input from the user, the software queries the e!Ensembl BioMart server to obtain positional information and discovers homology relations in all genes and species specified. Using this information, Kerfuffle performs a multi-species clustering analysis, presents downloadable lists of clustered genes, performs Monte Carlo statistical significance calculations, estimates how conserved gene clusters are across species, plots histograms and interactive graphs, allows users to save their queries, and generates a downloadable visualization of the clusters using the Circos software. These analyses may be used to further explore the functional roles of gene clusters by interrogating the enriched molecular pathways associated with each cluster.Comment: BMC Bioinformatics, In pres

    Cell non-autonomous interactions during non-immune stromal progression in the breast tumor microenvironment

    Get PDF
    Summary The breast tumor microenvironment of primary and metastatic sites is a complex milieu of differing cell populations, consisting of tumor cells and the surrounding stroma. Despite recent progress in delineating the immune component of the stroma, the genomic expression landscape of the non-immune stroma (NIS) population and their role in mediating cancer progression and informing effective therapies are not well understood. Here we obtained 52 cell-sorted NIS and epithelial tissue samples across 37 patients from i) normal breast, ii) normal breast adjacent to primary tumor, iii) primary tumor, and iv) metastatic tumor sites. Deep RNA-seq revealed diverging gene expression profiles as the NIS evolves from normal to metastatic tumor tissue, with intra-patient normal-primary variation comparable to inter-patient variation. Significant expression changes between normal and adjacent normal tissue support the notion of a cancer field effect, but extended out to the NIS. Most differentially expressed protein-coding genes and lncRNAs were found to be associated with pattern formation, embryogenesis, and the epithelial-mesenchymal transition. We validated the protein expression changes of a novel candidate gene, C2orf88, by immunohistochemistry staining of representative tissues. Significant mutual information between epithelial ligand and NIS receptor gene expression, across primary and metastatic tissue, suggests a unidirectional model of molecular signaling between the two tissues. Furthermore, survival analyses of 827 luminal breast tumor samples demonstrated the predictive power of the NIS gene expression to inform clinical outcomes. Together, these results highlight the evolution of NIS gene expression in breast tumors and suggest novel therapeutic strategies targeting the microenvironment

    A framework for highly multiplexed dextramer mapping and prediction of T cell receptor sequences to antigen specificity.

    Get PDF
    T cell receptor (TCR) antigen-specific recognition is essential for the adaptive immune system. However, building a TCR-antigen interaction map has been challenging due to the staggering diversity of TCRs and antigens. Accordingly, highly multiplexed dextramer-TCR binding assays have been recently developed, but the utility of the ensuing large datasets is limited by the lack of robust computational methods for normalization and interpretation. Here, we present a computational framework comprising a novel method, ICON (Integrative COntext-specific Normalization), for identifying reliable TCR-pMHC (peptide-major histocompatibility complex) interactions and a neural network-based classifier TCRAI that outperforms other state-of-the-art methods for TCR-antigen specificity prediction. We further demonstrated that by combining ICON and TCRAI, we are able to discover novel subgroups of TCRs that bind to a given pMHC via different mechanisms. Our framework facilitates the identification and understanding of TCR-antigen-specific interactions for basic immunological research and clinical immune monitoring

    Fine-scale detection of population-specific linkage disequilibrium using haplotype entropy in the human genome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The creation of a coherent genomic map of recent selection is one of the greatest challenges towards a better understanding of human evolution and the identification of functional genetic variants. Several methods have been proposed to detect linkage disequilibrium (LD), which is indicative of natural selection, from genome-wide profiles of common genetic variations but are designed for large regions.</p> <p>Results</p> <p>To find population-specific LD within small regions, we have devised an entropy-based method that utilizes differences in haplotype frequency between populations. The method has the advantages of incorporating multilocus association, conciliation with low allele frequencies, and independence from allele polarity, which are ideal for short haplotype analysis. The comparison of HapMap SNPs data from African and Caucasian populations with a median resolution size of ~23 kb gave us novel candidates as well as known selection targets. Enrichment analysis for the yielded genes showed associations with diverse diseases such as cardiovascular, immunological, neurological, and skeletal and muscular diseases. A possible scenario for a selective force is discussed. In addition, we have developed a web interface (ENIGMA, available at <url>http://gibk21.bse.kyutech.ac.jp/ENIGMA/index.html</url>), which allows researchers to query their regions of interest for population-specific LD.</p> <p>Conclusion</p> <p>The haplotype entropy method is powerful for detecting population-specific LD embedded in short regions and should contribute to further studies aiming to decipher the evolutionary histories of modern humans.</p

    Novel Foxo1-dependent transcriptional programs control Treg cell function

    Get PDF
    Regulatory T (T(reg)) cells, characterized by expression of the transcription factor forkhead box P3 (Foxp3), maintain immune homeostasis by suppressing self-destructive immune responses. Foxp3 operates as a late-acting differentiation factor controlling T(reg) cell homeostasis and function, whereas the early T(reg)-cell-lineage commitment is regulated by the Akt kinase and the forkhead box O (Foxo) family of transcription factors. However, whether Foxo proteins act beyond the T(reg)-cell-commitment stage to control T(reg) cell homeostasis and function remains largely unexplored. Here we show that Foxo1 is a pivotal regulator of T(reg )cell function. T(reg) cells express high amounts of Foxo1 and display reduced T-cell-receptor-induced Akt activation, Foxo1 phosphorylation and Foxo1 nuclear exclusion. Mice with T(reg)-cell-specific deletion of Foxo1 develop a fatal inflammatory disorder similar in severity to that seen in Foxp3-deficient mice, but without the loss of T(reg) cells. Genome-wide analysis of Foxo1 binding sites reveals ~300 Foxo1-bound target genes, including the pro-inflammatory cytokine Ifng, that do not seem to be directly regulated by Foxp3. These findings show that the evolutionarily ancient Akt-Foxo1 signalling module controls a novel genetic program indispensable for T(reg) cell function
    corecore