1,098 research outputs found

    Distance, dissimilarity index, and network community structure

    Full text link
    We address the question of finding the community structure of a complex network. In an earlier effort [H. Zhou, {\em Phys. Rev. E} (2003)], the concept of network random walking is introduced and a distance measure defined. Here we calculate, based on this distance measure, the dissimilarity index between nearest-neighboring vertices of a network and design an algorithm to partition these vertices into communities that are hierarchically organized. Each community is characterized by an upper and a lower dissimilarity threshold. The algorithm is applied to several artificial and real-world networks, and excellent results are obtained. In the case of artificially generated random modular networks, this method outperforms the algorithm based on the concept of edge betweenness centrality. For yeast's protein-protein interaction network, we are able to identify many clusters that have well defined biological functions.Comment: 10 pages, 7 figures, REVTeX4 forma

    Historical urban growth in Europe (1300–1800)

    Get PDF
    This paper analyses the evolution of the European urban system from a long-term perspective (from 1300 to 1800). Using the method recently proposed by Clauset, Shalizi, and Newman, a Pareto-type city size distribution (power law) is rejected from 1300 to 1600. A power law is a plausible model for the city size distribution only in 1700 and 1800, although the log-normal distribution is another plausible alternative model that we cannot reject. Moreover, the random growth of cities is rejected using parametric and non-parametric methods. The results reveal a clear pattern of convergent growth in all the periods

    neXtProt: a knowledge platform for human proteins

    Get PDF
    neXtProt (http://www.nextprot.org/) is a new human protein-centric knowledge platform. Developed at the Swiss Institute of Bioinformatics (SIB), it aims to help researchers answer questions relevant to human proteins. To achieve this goal, neXtProt is built on a corpus containing both curated knowledge originating from the UniProtKB/Swiss-Prot knowledgebase and carefully selected and filtered high-throughput data pertinent to human proteins. This article presents an overview of the database and the data integration process. We also lay out the key future directions of neXtProt that we consider the necessary steps to make neXtProt the one-stop-shop for all research projects focusing on human proteins

    The PROSITE database, its status in 1999

    Get PDF
    The PROSITE database (http://www.expasy.ch/sprot/prosite.htm l) consists of biologically significant patterns and profiles formulated in such a way that with appropriate computational tools it can help to determine to which known family of protein (if any) a new sequence belongs, or which known domain(s) it contains

    An approach to describing and analysing bulk biological annotation quality: a case study using UniProtKB

    Full text link
    Motivation: Annotations are a key feature of many biological databases, used to convey our knowledge of a sequence to the reader. Ideally, annotations are curated manually, however manual curation is costly, time consuming and requires expert knowledge and training. Given these issues and the exponential increase of data, many databases implement automated annotation pipelines in an attempt to avoid un-annotated entries. Both manual and automated annotations vary in quality between databases and annotators, making assessment of annotation reliability problematic for users. The community lacks a generic measure for determining annotation quality and correctness, which we look at addressing within this article. Specifically we investigate word reuse within bulk textual annotations and relate this to Zipf's Principle of Least Effort. We use UniProt Knowledge Base (UniProtKB) as a case study to demonstrate this approach since it allows us to compare annotation change, both over time and between automated and manually curated annotations. Results: By applying power-law distributions to word reuse in annotation, we show clear trends in UniProtKB over time, which are consistent with existing studies of quality on free text English. Further, we show a clear distinction between manual and automated analysis and investigate cohorts of protein records as they mature. These results suggest that this approach holds distinct promise as a mechanism for judging annotation quality. Availability: Source code is available at the authors website: http://homepages.cs.ncl.ac.uk/m.j.bell1/annotation. Contact: [email protected]: Paper accepted at The European Conference on Computational Biology 2012 (ECCB'12). Subsequently will be published in a special issue of the journal Bioinformatics. Paper consists of 8 pages, made up of 5 figure

    A survey of orphan enzyme activities

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Using computational database searches, we have demonstrated previously that no gene sequences could be found for at least 36% of enzyme activities that have been assigned an Enzyme Commission number. Here we present a follow-up literature-based survey involving a statistically significant sample of such "orphan" activities. The survey was intended to determine whether sequences for these enzyme activities are truly unknown, or whether these sequences are absent from the public sequence databases but can be found in the literature.</p> <p>Results</p> <p>We demonstrate that for ~80% of sampled orphans, the absence of sequence data is bona fide. Our analyses further substantiate the notion that many of these enzyme activities play biologically important roles.</p> <p>Conclusion</p> <p>This survey points toward significant scientific cost of having such a large fraction of characterized enzyme activities disconnected from sequence data. It also suggests that a larger effort, beginning with a comprehensive survey of all putative orphan activities, would resolve nearly 300 artifactual orphans and reconnect a wealth of enzyme research with modern genomics. For these reasons, we propose that a systematic effort to identify the cognate genes of orphan enzymes be undertaken.</p

    Economic Backwardness and Social Tension

    Get PDF
    We propose that relative economic backwardness contributes to the build-up of social tension and non-violent and violent conflict. We test our hypothesis using data on organized mass movements and armed civil conflict. The findings show that greater economic backwardness is consistently linked to a higher probability of onset of violent and especially non-violent forms of civil unrest. We provide evidence that the relationship is causal in instrumental variables estimations using new instruments, including mailing speeds and telegram charges around 1900. The magnitude of the effect of backwardness on social tension increases in the two-stage least-squares estimations

    Complex networks theory for analyzing metabolic networks

    Full text link
    One of the main tasks of post-genomic informatics is to systematically investigate all molecules and their interactions within a living cell so as to understand how these molecules and the interactions between them relate to the function of the organism, while networks are appropriate abstract description of all kinds of interactions. In the past few years, great achievement has been made in developing theory of complex networks for revealing the organizing principles that govern the formation and evolution of various complex biological, technological and social networks. This paper reviews the accomplishments in constructing genome-based metabolic networks and describes how the theory of complex networks is applied to analyze metabolic networks.Comment: 13 pages, 2 figure

    Political Regimes and Sovereign Credit Risk in Europe, 1750-1913

    Get PDF
    This article uses a new panel data set to perform a statistical analysis of political regimes and sovereign credit risk in Europe from 1750 to 1913. Old Regime polities typically suffered from fiscal fragmentation and absolutist rule. By the start of World War I, however, many such countries had centralized institutions and limited government. Panel regressions indicate that centralized and?or limited regimes were associated with significant improvements in credit risk relative to fragmented and absolutist ones. Structural break tests also reveal close relationships between major turning points in yield series and political transformations

    Protein folding using contact maps

    Full text link
    We present the development of the idea to use dynamics in the space of contact maps as a computational approach to the protein folding problem. We first introduce two important technical ingredients, the reconstruction of a three dimensional conformation from a contact map and the Monte Carlo dynamics in contact map space. We then discuss two approximations to the free energy of the contact maps and a method to derive energy parameters based on perceptron learning. Finally we present results, first for predictions based on threading and then for energy minimization of crambin and of a set of 6 immunoglobulins. The main result is that we proved that the two simple approximations we studied for the free energy are not suitable for protein folding. Perspectives are discussed in the last section.Comment: 29 pages, 10 figure
    corecore