1,098 research outputs found
Distance, dissimilarity index, and network community structure
We address the question of finding the community structure of a complex
network. In an earlier effort [H. Zhou, {\em Phys. Rev. E} (2003)], the concept
of network random walking is introduced and a distance measure defined. Here we
calculate, based on this distance measure, the dissimilarity index between
nearest-neighboring vertices of a network and design an algorithm to partition
these vertices into communities that are hierarchically organized. Each
community is characterized by an upper and a lower dissimilarity threshold. The
algorithm is applied to several artificial and real-world networks, and
excellent results are obtained. In the case of artificially generated random
modular networks, this method outperforms the algorithm based on the concept of
edge betweenness centrality. For yeast's protein-protein interaction network,
we are able to identify many clusters that have well defined biological
functions.Comment: 10 pages, 7 figures, REVTeX4 forma
Historical urban growth in Europe (1300–1800)
This paper analyses the evolution of the European urban system from a long-term perspective (from 1300 to 1800). Using the method recently proposed by Clauset, Shalizi, and Newman, a Pareto-type city size distribution (power law) is rejected from 1300 to 1600. A power law is a plausible model for the city size distribution only in 1700 and 1800, although the log-normal distribution is another plausible alternative model that we cannot reject. Moreover, the random growth of cities is rejected using parametric and non-parametric methods. The results reveal a clear pattern of convergent growth in all the periods
neXtProt: a knowledge platform for human proteins
neXtProt (http://www.nextprot.org/) is a new human protein-centric knowledge platform. Developed at the Swiss Institute of Bioinformatics (SIB), it aims to help researchers answer questions relevant to human proteins. To achieve this goal, neXtProt is built on a corpus containing both curated knowledge originating from the UniProtKB/Swiss-Prot knowledgebase and carefully selected and filtered high-throughput data pertinent to human proteins. This article presents an overview of the database and the data integration process. We also lay out the key future directions of neXtProt that we consider the necessary steps to make neXtProt the one-stop-shop for all research projects focusing on human proteins
The PROSITE database, its status in 1999
The PROSITE database (http://www.expasy.ch/sprot/prosite.htm l) consists of biologically significant patterns and profiles formulated in such a way that with appropriate computational tools it can help to determine to which known family of protein (if any) a new sequence belongs, or which known domain(s) it contains
An approach to describing and analysing bulk biological annotation quality: a case study using UniProtKB
Motivation: Annotations are a key feature of many biological databases, used
to convey our knowledge of a sequence to the reader. Ideally, annotations are
curated manually, however manual curation is costly, time consuming and
requires expert knowledge and training. Given these issues and the exponential
increase of data, many databases implement automated annotation pipelines in an
attempt to avoid un-annotated entries. Both manual and automated annotations
vary in quality between databases and annotators, making assessment of
annotation reliability problematic for users. The community lacks a generic
measure for determining annotation quality and correctness, which we look at
addressing within this article. Specifically we investigate word reuse within
bulk textual annotations and relate this to Zipf's Principle of Least Effort.
We use UniProt Knowledge Base (UniProtKB) as a case study to demonstrate this
approach since it allows us to compare annotation change, both over time and
between automated and manually curated annotations.
Results: By applying power-law distributions to word reuse in annotation, we
show clear trends in UniProtKB over time, which are consistent with existing
studies of quality on free text English. Further, we show a clear distinction
between manual and automated analysis and investigate cohorts of protein
records as they mature. These results suggest that this approach holds distinct
promise as a mechanism for judging annotation quality.
Availability: Source code is available at the authors website:
http://homepages.cs.ncl.ac.uk/m.j.bell1/annotation.
Contact: [email protected]: Paper accepted at The European Conference on Computational Biology
2012 (ECCB'12). Subsequently will be published in a special issue of the
journal Bioinformatics. Paper consists of 8 pages, made up of 5 figure
A survey of orphan enzyme activities
<p>Abstract</p> <p>Background</p> <p>Using computational database searches, we have demonstrated previously that no gene sequences could be found for at least 36% of enzyme activities that have been assigned an Enzyme Commission number. Here we present a follow-up literature-based survey involving a statistically significant sample of such "orphan" activities. The survey was intended to determine whether sequences for these enzyme activities are truly unknown, or whether these sequences are absent from the public sequence databases but can be found in the literature.</p> <p>Results</p> <p>We demonstrate that for ~80% of sampled orphans, the absence of sequence data is bona fide. Our analyses further substantiate the notion that many of these enzyme activities play biologically important roles.</p> <p>Conclusion</p> <p>This survey points toward significant scientific cost of having such a large fraction of characterized enzyme activities disconnected from sequence data. It also suggests that a larger effort, beginning with a comprehensive survey of all putative orphan activities, would resolve nearly 300 artifactual orphans and reconnect a wealth of enzyme research with modern genomics. For these reasons, we propose that a systematic effort to identify the cognate genes of orphan enzymes be undertaken.</p
Economic Backwardness and Social Tension
We propose that relative economic backwardness contributes to the build-up of social tension and non-violent and violent conflict. We test our hypothesis using data on organized mass movements and armed civil conflict. The findings show that greater economic backwardness is consistently linked to a higher probability of onset of violent and especially non-violent forms of civil unrest. We provide evidence that the relationship is causal in instrumental variables estimations using new instruments, including mailing speeds and telegram charges around 1900. The magnitude of the effect of backwardness on social tension increases in the two-stage least-squares estimations
Complex networks theory for analyzing metabolic networks
One of the main tasks of post-genomic informatics is to systematically
investigate all molecules and their interactions within a living cell so as to
understand how these molecules and the interactions between them relate to the
function of the organism, while networks are appropriate abstract description
of all kinds of interactions. In the past few years, great achievement has been
made in developing theory of complex networks for revealing the organizing
principles that govern the formation and evolution of various complex
biological, technological and social networks. This paper reviews the
accomplishments in constructing genome-based metabolic networks and describes
how the theory of complex networks is applied to analyze metabolic networks.Comment: 13 pages, 2 figure
Political Regimes and Sovereign Credit Risk in Europe, 1750-1913
This article uses a new panel data set to perform a statistical analysis of political regimes and sovereign credit risk in Europe from 1750 to 1913. Old Regime polities typically suffered from fiscal fragmentation and absolutist rule. By the start of World War I, however, many such countries had centralized institutions and limited government. Panel regressions indicate that centralized and?or limited regimes were associated with significant improvements in credit risk relative to fragmented and absolutist ones. Structural break tests also reveal close relationships between major turning points in yield series and political transformations
Protein folding using contact maps
We present the development of the idea to use dynamics in the space of
contact maps as a computational approach to the protein folding problem. We
first introduce two important technical ingredients, the reconstruction of a
three dimensional conformation from a contact map and the Monte Carlo dynamics
in contact map space. We then discuss two approximations to the free energy of
the contact maps and a method to derive energy parameters based on perceptron
learning. Finally we present results, first for predictions based on threading
and then for energy minimization of crambin and of a set of 6 immunoglobulins.
The main result is that we proved that the two simple approximations we studied
for the free energy are not suitable for protein folding. Perspectives are
discussed in the last section.Comment: 29 pages, 10 figure
- …