173 research outputs found

    Extensions of the External Validation for Checking Learned Model Interpretability and Generalizability.

    Full text link
    We discuss the validation of machine learning models, which is standard practice in determining model efficacy and generalizability. We argue that internal validation approaches, such as cross-validation and bootstrap, cannot guarantee the quality of a machine learning model due to potentially biased training data and the complexity of the validation procedure itself. For better evaluating the generalization ability of a learned model, we suggest leveraging on external data sources from elsewhere as validation datasets, namely external validation. Due to the lack of research attractions on external validation, especially a well-structured and comprehensive study, we discuss the necessity for external validation and propose two extensions of the external validation approach that may help reveal the true domain-relevant model from a candidate set. Moreover, we also suggest a procedure to check whether a set of validation datasets is valid and introduce statistical reference points for detecting external data problems

    Mesoscopic organization reveals the constraints governing C. elegans nervous system

    Get PDF
    One of the biggest challenges in biology is to understand how activity at the cellular level of neurons, as a result of their mutual interactions, leads to the observed behavior of an organism responding to a variety of environmental stimuli. Investigating the intermediate or mesoscopic level of organization in the nervous system is a vital step towards understanding how the integration of micro-level dynamics results in macro-level functioning. In this paper, we have considered the somatic nervous system of the nematode Caenorhabditis elegans, for which the entire neuronal connectivity diagram is known. We focus on the organization of the system into modules, i.e., neuronal groups having relatively higher connection density compared to that of the overall network. We show that this mesoscopic feature cannot be explained exclusively in terms of considerations, such as optimizing for resource constraints (viz., total wiring cost) and communication efficiency (i.e., network path length). Comparison with other complex networks designed for efficient transport (of signals or resources) implies that neuronal networks form a distinct class. This suggests that the principal function of the network, viz., processing of sensory information resulting in appropriate motor response, may be playing a vital role in determining the connection topology. Using modular spectral analysis, we make explicit the intimate relation between function and structure in the nervous system. This is further brought out by identifying functionally critical neurons purely on the basis of patterns of intra- and inter-modular connections. Our study reveals how the design of the nervous system reflects several constraints, including its key functional role as a processor of information.Comment: Published version, Minor modifications, 16 pages, 9 figure

    A unified data representation theory for network visualization, ordering and coarse-graining

    Get PDF
    Representation of large data sets became a key question of many scientific disciplines in the last decade. Several approaches for network visualization, data ordering and coarse-graining accomplished this goal. However, there was no underlying theoretical framework linking these problems. Here we show an elegant, information theoretic data representation approach as a unified solution of network visualization, data ordering and coarse-graining. The optimal representation is the hardest to distinguish from the original data matrix, measured by the relative entropy. The representation of network nodes as probability distributions provides an efficient visualization method and, in one dimension, an ordering of network nodes and edges. Coarse-grained representations of the input network enable both efficient data compression and hierarchical visualization to achieve high quality representations of larger data sets. Our unified data representation theory will help the analysis of extensive data sets, by revealing the large-scale structure of complex networks in a comprehensible form.Comment: 13 pages, 5 figure

    Graphene membranes for water desalination

    Get PDF
    Extensive environmental pollution caused by worldwide industrialization and population growth has led to a water shortage. This problem lowers the quality of human life and wastes a large amount of money worldwide each year due to the related consequences. One main solution for this challenge is water purification. State-of-the-art water purification necessitates the implementation of novel materials and technologies that are cost and energy efficient. In this regard, graphene nanomaterials, with their unique physicochemical properties, are an optimum choice. These materials offer extraordinarily high surface area, mechanical durability, atomic thickness, nanosized pores and reactivity toward polar and non-polar water pollutants. These characteristics impart high selectivity and water permeability, and thus provide excellent water purification efficiency. This review introduces the potential of graphene membranes for water desalination. Although literature reviews have mostly concerned graphene's capability for the adsorption and photocatalysis of water pollutants, updated knowledge related to its sieving properties is quite limited.Peer reviewe

    Functional Genetic Diversity among Mycobacterium tuberculosis Complex Clinical Isolates: Delineation of Conserved Core and Lineage-Specific Transcriptomes during Intracellular Survival

    Get PDF
    Tuberculosis exerts a tremendous burden on global health, with ∼9 million new infections and ∼2 million deaths annually. The Mycobacterium tuberculosis complex (MTC) was initially regarded as a highly homogeneous population; however, recent data suggest the causative agents of tuberculosis are more genetically and functionally diverse than appreciated previously. The impact of this natural variation on the virulence and clinical manifestations of the pathogen remains largely unknown. This report examines the effect of genetic diversity among MTC clinical isolates on global gene expression and survival within macrophages. We discovered lineage-specific transcription patterns in vitro and distinct intracellular growth profiles associated with specific responses to host-derived environmental cues. Strain comparisons also facilitated delineation of a core intracellular transcriptome, including genes with highly conserved regulation across the global panel of clinical isolates. This study affords new insights into the genetic information that M. tuberculosis has conserved under selective pressure during its long-term interactions with its human host

    Intronic Cis-Regulatory Modules Mediate Tissue-Specific and Microbial Control of angptl4/fiaf Transcription

    Get PDF
    The intestinal microbiota enhances dietary energy harvest leading to increased fat storage in adipose tissues. This effect is caused in part by the microbial suppression of intestinal epithelial expression of a circulating inhibitor of lipoprotein lipase called Angiopoietin-like 4 (Angptl4/Fiaf). To define the cis-regulatory mechanisms underlying intestine-specific and microbial control of Angptl4 transcription, we utilized the zebrafish system in which host regulatory DNA can be rapidly analyzed in a live, transparent, and gnotobiotic vertebrate. We found that zebrafish angptl4 is transcribed in multiple tissues including the liver, pancreatic islet, and intestinal epithelium, which is similar to its mammalian homologs. Zebrafish angptl4 is also specifically suppressed in the intestinal epithelium upon colonization with a microbiota. In vivo transgenic reporter assays identified discrete tissue-specific regulatory modules within angptl4 intron 3 sufficient to drive expression in the liver, pancreatic islet β-cells, or intestinal enterocytes. Comparative sequence analyses and heterologous functional assays of angptl4 intron 3 sequences from 12 teleost fish species revealed differential evolution of the islet and intestinal regulatory modules. High-resolution functional mapping and site-directed mutagenesis defined the minimal set of regulatory sequences required for intestinal activity. Strikingly, the microbiota suppressed the transcriptional activity of the intestine-specific regulatory module similar to the endogenous angptl4 gene. These results suggest that the microbiota might regulate host intestinal Angptl4 protein expression and peripheral fat storage by suppressing the activity of an intestine-specific transcriptional enhancer. This study provides a useful paradigm for understanding how microbial signals interact with tissue-specific regulatory networks to control the activity and evolution of host gene transcription

    Triangle network motifs predict complexes by complementing high-error interactomes with structural information

    Get PDF
    BackgroundA lot of high-throughput studies produce protein-protein interaction networks (PPINs) with many errors and missing information. Even for genome-wide approaches, there is often a low overlap between PPINs produced by different studies. Second-level neighbors separated by two protein-protein interactions (PPIs) were previously used for predicting protein function and finding complexes in high-error PPINs. We retrieve second level neighbors in PPINs, and complement these with structural domain-domain interactions (SDDIs) representing binding evidence on proteins, forming PPI-SDDI-PPI triangles.ResultsWe find low overlap between PPINs, SDDIs and known complexes, all well below 10%. We evaluate the overlap of PPI-SDDI-PPI triangles with known complexes from Munich Information center for Protein Sequences (MIPS). PPI-SDDI-PPI triangles have ~20 times higher overlap with MIPS complexes than using second-level neighbors in PPINs without SDDIs. The biological interpretation for triangles is that a SDDI causes two proteins to be observed with common interaction partners in high-throughput experiments. The relatively few SDDIs overlapping with PPINs are part of highly connected SDDI components, and are more likely to be detected in experimental studies. We demonstrate the utility of PPI-SDDI-PPI triangles by reconstructing myosin-actin processes in the nucleus, cytoplasm, and cytoskeleton, which were not obvious in the original PPIN. Using other complementary datatypes in place of SDDIs to form triangles, such as PubMed co-occurrences or threading information, results in a similar ability to find protein complexes.ConclusionGiven high-error PPINs with missing information, triangles of mixed datatypes are a promising direction for finding protein complexes. Integrating PPINs with SDDIs improves finding complexes. Structural SDDIs partially explain the high functional similarity of second-level neighbors in PPINs. We estimate that relatively little structural information would be sufficient for finding complexes involving most of the proteins and interactions in a typical PPIN

    Ubiquitous molecular substrates for associative learning and activity-dependent neuronal facilitation.

    Get PDF
    Recent evidence suggests that many of the molecular cascades and substrates that contribute to learning-related forms of neuronal plasticity may be conserved across ostensibly disparate model systems. Notably, the facilitation of neuronal excitability and synaptic transmission that contribute to associative learning in Aplysia and Hermissenda, as well as associative LTP in hippocampal CA1 cells, all require (or are enhanced by) the convergence of a transient elevation in intracellular Ca2+ with transmitter binding to metabotropic cell-surface receptors. This temporal convergence of Ca2+ and G-protein-stimulated second-messenger cascades synergistically stimulates several classes of serine/threonine protein kinases, which in turn modulate receptor function or cell excitability through the phosphorylation of ion channels. We present a summary of the biophysical and molecular constituents of neuronal and synaptic facilitation in each of these three model systems. Although specific components of the underlying molecular cascades differ across these three systems, fundamental aspects of these cascades are widely conserved, leading to the conclusion that the conceptual semblance of these superficially disparate systems is far greater than is generally acknowledged. We suggest that the elucidation of mechanistic similarities between different systems will ultimately fulfill the goal of the model systems approach, that is, the description of critical and ubiquitous features of neuronal and synaptic events that contribute to memory induction
    corecore