181 research outputs found

    Local multiresolution order in community detection

    Full text link
    Community detection algorithms attempt to find the best clusters of nodes in an arbitrary complex network. Multi-scale ("multiresolution") community detection extends the problem to identify the best network scale(s) for these clusters. The latter task is generally accomplished by analyzing community stability simultaneously for all clusters in the network. In the current work, we extend this general approach to define local multiresolution methods, which enable the extraction of well-defined local communities even if the global community structure is vaguely defined in an average sense. Toward this end, we propose measures analogous to variation of information and normalized mutual information that are used to quantitatively identify the best resolution(s) at the community level based on correlations between clusters in independently-solved systems. We demonstrate our method on two constructed networks as well as a real network and draw inferences about local community strength. Our approach is independent of the applied community detection algorithm save for the inherent requirement that the method be able to identify communities across different network scales, with appropriate changes to account for how different resolutions are evaluated or defined in a particular community detection method. It should, in principle, easily adapt to alternative community comparison measures.Comment: 19 pages, 11 figure

    Inference of hidden structures in complex physical systems by multi-scale clustering

    Full text link
    We survey the application of a relatively new branch of statistical physics--"community detection"-- to data mining. In particular, we focus on the diagnosis of materials and automated image segmentation. Community detection describes the quest of partitioning a complex system involving many elements into optimally decoupled subsets or communities of such elements. We review a multiresolution variant which is used to ascertain structures at different spatial and temporal scales. Significant patterns are obtained by examining the correlations between different independent solvers. Similar to other combinatorial optimization problems in the NP complexity class, community detection exhibits several phases. Typically, illuminating orders are revealed by choosing parameters that lead to extremal information theory correlations.Comment: 25 pages, 16 Figures; a review of earlier work

    Evidence for the role of EPHX2 gene variants in anorexia nervosa.

    Get PDF
    Anorexia nervosa (AN) and related eating disorders are complex, multifactorial neuropsychiatric conditions with likely rare and common genetic and environmental determinants. To identify genetic variants associated with AN, we pursued a series of sequencing and genotyping studies focusing on the coding regions and upstream sequence of 152 candidate genes in a total of 1205 AN cases and 1948 controls. We identified individual variant associations in the Estrogen Receptor-ß (ESR2) gene, as well as a set of rare and common variants in the Epoxide Hydrolase 2 (EPHX2) gene, in an initial sequencing study of 261 early-onset severe AN cases and 73 controls (P=0.0004). The association of EPHX2 variants was further delineated in: (1) a pooling-based replication study involving an additional 500 AN patients and 500 controls (replication set P=0.00000016); (2) single-locus studies in a cohort of 386 previously genotyped broadly defined AN cases and 295 female population controls from the Bogalusa Heart Study (BHS) and a cohort of 58 individuals with self-reported eating disturbances and 851 controls (combined smallest single locus P<0.01). As EPHX2 is known to influence cholesterol metabolism, and AN is often associated with elevated cholesterol levels, we also investigated the association of EPHX2 variants and longitudinal body mass index (BMI) and cholesterol in BHS female and male subjects (N=229) and found evidence for a modifying effect of a subset of variants on the relationship between cholesterol and BMI (P<0.01). These findings suggest a novel association of gene variants within EPHX2 to susceptibility to AN and provide a foundation for future study of this important yet poorly understood condition

    The microstructure of coaching practice:Behaviours and activities of an elite rugby union head coach during preparation and competition

    Get PDF
    The activities and behaviours of a female head coach of a national rugby union team were recorded in both training and competition, across a whole rugby season, using the newly developed Rugby Coach Activities and Behaviours Instrument (RCABI). The instrument incorporates 24 categories of behaviour, embedded within three forms of activity (training form, playing form and competitive match) and seven sub-activity types. In contrast to traditional drill-based coaching, 58.5% of training time was found to have been spent in playing form activities. Moreover, the proportion of playing form activities increased to a peak average of 83.8% in proximity to the team’s annual international championship. Uniquely, one of the coach’s most prolific behaviours was conferring with associates (23.3%), highlighting the importance of interactions with assistant coaches, medical staff and others in shaping the coaching process. Additionally, the frequencies of key behaviours such as questioning and praise were found to vary between the different activity forms and types, raising questions about previous conceptions of effective coaching practice. The findings are discussed in the light of the Game Sense philosophy and the role of the head coach

    Combined node and link partitions method for finding overlapping communities in complex networks

    Get PDF
    Community detection in complex networks is a fundamental data analysis task in various domains, and how to effectively find overlapping communities in real applications is still a challenge. In this work, we propose a new unified model and method for finding the best overlapping communities on the basis of the associated node and link partitions derived from the same framework. Specifically, we first describe a unified model that accommodates node and link communities (partitions) together, and then present a nonnegative matrix factorization method to learn the parameters of the model. Thereafter, we infer the overlapping communities based on the derived node and link communities, i.e., determine each overlapped community between the corresponding node and link community with a greedy optimization of a local community function conductance. Finally, we introduce a model selection method based on consensus clustering to determine the number of communities. We have evaluated our method on both synthetic and real-world networks with ground-truths, and compared it with seven state-of-the-art methods. The experimental results demonstrate the superior performance of our method over the competing ones in detecting overlapping communities for all analysed data sets. Improved performance is particularly pronounced in cases of more complicated networked community structures

    Genome-Wide Modeling of Transcription Preinitiation Complex Disassembly Mechanisms using ChIP-chip Data

    Get PDF
    Apparent occupancy levels of proteins bound to DNA in vivo can now be routinely measured on a genomic scale. A challenge in relating these occupancy levels to assembly mechanisms that are defined with biochemically isolated components lies in the veracity of assumptions made regarding the in vivo system. Assumptions regarding behavior of molecules in vivo can neither be proven true nor false, and thus is necessarily subjective. Nevertheless, within those confines, connecting in vivo protein-DNA interaction observations with defined biochemical mechanisms is an important step towards fully defining and understanding assembly/disassembly mechanisms in vivo. To this end, we have developed a computational program PathCom that models in vivo protein-DNA occupancy data as biochemical mechanisms under the assumption that occupancy levels can be related to binding duration and explicitly defined assembly/disassembly reactions. We exemplify the process with the assembly of the general transcription factors (TBP, TFIIB, TFIIE, TFIIF, TFIIH, and RNA polymerase II) at the genes of the budding yeast Saccharomyces. Within the assumption inherent in the system our modeling suggests that TBP occupancy at promoters is rather transient compared to other general factors, despite the importance of TBP in nucleating assembly of the preinitiation complex. PathCom is suitable for modeling any assembly/disassembly pathway, given that all the proteins (or species) come together to form a complex

    Tight cooperation between Mot1p and NC2β in regulating genome-wide transcription, repression of transcription following heat shock induction and genetic interaction with SAGA

    Get PDF
    TATA-binding protein (TBP) is central to the regulation of eukaryotic transcription initiation. Recruitment of TBP to target genes can be positively regulated by one of two basal transcription factor complexes: SAGA or TFIID. Negative regulation of TBP promoter association can be performed by Mot1p or the NC2 complex. Recent evidence suggests that Mot1p, NC2 and TBP form a DNA-dependent protein complex. Here, we compare the functions of Mot1p and NC2βduring basal and activated transcription using the anchor-away technique for conditional nuclear depletion. Genome-wide expression analysis indicates that both proteins regulate a highly similar set of genes. Upregulated genes were enriched for SAGA occupancy, while downregulated genes preferred TFIID binding. Mot1p and NC2β depletion during heat shock resulted in failure to downregulate gene expression after initial activation, which was accompanied by increased TBP and RNA pol II promoter occupancies. Depletion of Mot1p or NC2β displayed preferential synthetic lethality with the TBP-interaction module of SAGA. Our results support the model that Mot1p and NC2β directly cooperate in vivo to regulate TBP function, and that they are involved in maintaining basal expression levels as well as in resetting gene expression after induction by stress

    Triad pattern algorithm for predicting strong promoter candidates in bacterial genomes

    Get PDF
    Abstract Background Bacterial promoters, which increase the efficiency of gene expression, differ from other promoters by several characteristics. This difference, not yet widely exploited in bioinformatics, looks promising for the development of relevant computational tools to search for strong promoters in bacterial genomes. Results We describe a new triad pattern algorithm that predicts strong promoter candidates in annotated bacterial genomes by matching specific patterns for the group I σ70 factors of Escherichia coli RNA polymerase. It detects promoter-specific motifs by consecutively matching three patterns, consisting of an UP-element, required for interaction with the α subunit, and then optimally-separated patterns of -35 and -10 boxes, required for interaction with the σ70 subunit of RNA polymerase. Analysis of 43 bacterial genomes revealed that the frequency of candidate sequences depends on the A+T content of the DNA under examination. The accuracy of in silico prediction was experimentally validated for the genome of a hyperthermophilic bacterium, Thermotoga maritima, by applying a cell-free expression assay using the predicted strong promoters. In this organism, the strong promoters govern genes for translation, energy metabolism, transport, cell movement, and other as-yet unidentified functions. Conclusion The triad pattern algorithm developed for predicting strong bacterial promoters is well suited for analyzing bacterial genomes with an A+T content of less than 62%. This computational tool opens new prospects for investigating global gene expression, and individual strong promoters in bacteria of medical and/or economic significance.</p
    corecore