3,346 research outputs found
Statistical mechanics of ontology based annotations
We present a statistical mechanical theory of the process of annotating an
object with terms selected from an ontology. The term selection process is
formulated as an ideal lattice gas model, but in a highly structured
inhomogeneous field. The model enables us to explain patterns recently observed
in real-world annotation data sets, in terms of the underlying graph structure
of the ontology. By relating the external field strengths to the information
content of each node in the ontology graph, the statistical mechanical model
also allows us to propose a number of practical metrics for assessing the
quality of both the ontology, and the annotations that arise from its use.
Using the statistical mechanical formalism we also study an ensemble of
ontologies of differing size and complexity; an analysis not readily performed
using real data alone. Focusing on regular tree ontology graphs we uncover a
rich set of scaling laws describing the growth in the optimal ontology size as
the number of objects being annotated increases. In doing so we provide a
further possible measure for assessment of ontologies.Comment: 27 pages, 5 figure
Revisiting Date and Party Hubs: Novel Approaches to Role Assignment in Protein Interaction Networks
The idea of 'date' and 'party' hubs has been influential in the study of
protein-protein interaction networks. Date hubs display low co-expression with
their partners, whilst party hubs have high co-expression. It was proposed that
party hubs are local coordinators whereas date hubs are global connectors. Here
we show that the reported importance of date hubs to network connectivity can
in fact be attributed to a tiny subset of them. Crucially, these few, extremely
central, hubs do not display particularly low expression correlation,
undermining the idea of a link between this quantity and hub function. The
date/party distinction was originally motivated by an approximately bimodal
distribution of hub co-expression; we show that this feature is not always
robust to methodological changes. Additionally, topological properties of hubs
do not in general correlate with co-expression. Thus, we suggest that a
date/party dichotomy is not meaningful and it might be more useful to conceive
of roles for protein-protein interactions rather than individual proteins. We
find significant correlations between interaction centrality and the functional
similarity of the interacting proteins.Comment: 27 pages, 5 main figures, 4 supplementary figure
Ribosome traffic on mRNAs maps to gene ontology : genome-wide quantification of translation initiation rates and polysome size regulation
Peer reviewedPublisher PD
The architecture of the protein domain universe
Understanding the design of the universe of protein structures may provide
insights into protein evolution. We study the architecture of the protein
domain universe, which has been found to poses peculiar scale-free properties
(Dokholyan et al., Proc. Natl. Acad. Sci. USA 99: 14132-14136 (2002)). We
examine the origin of these scale-free properties of the graph of protein
domain structures (PDUG) and determine that that the PDUG is not modular, i.e.
it does not consist of modules with uniform properties. Instead, we find the
PDUG to be self-similar at all scales. We further characterize the PDUG
architecture by studying the properties of the hub nodes that are responsible
for the scale-free connectivity of the PDUG. We introduce a measure of the
betweenness centrality of protein domains in the PDUG and find a power-law
distribution of the betweenness centrality values. The scale-free distribution
of hubs in the protein universe suggests that a set of specific statistical
mechanics models, such as the self-organized criticality model, can potentially
identify the principal driving forces of molecular evolution. We also find a
gatekeeper protein domain, removal of which partitions the largest cluster into
two large sub-clusters. We suggest that the loss of such gatekeeper protein
domains in the course of evolution is responsible for the creation of new fold
families.Comment: 14 pages, 3 figure
Comparison and validation of community structures in complex networks
The issue of partitioning a network into communities has attracted a great
deal of attention recently. Most authors seem to equate this issue with the one
of finding the maximum value of the modularity, as defined by Newman. Since the
problem formulated this way is NP-hard, most effort has gone into the
construction of search algorithms, and less to the question of other measures
of community structures, similarities between various partitionings and the
validation with respect to external information. Here we concentrate on a class
of computer generated networks and on three well-studied real networks which
constitute a bench-mark for network studies; the karate club, the US college
football teams and a gene network of yeast. We utilize some standard ways of
clustering data (originally not designed for finding community structures in
networks) and show that these classical methods sometimes outperform the newer
ones. We discuss various measures of the strength of the modular structure, and
show by examples features and drawbacks. Further, we compare different
partitions by applying some graph-theoretic concepts of distance, which
indicate that one of the quality measures of the degree of modularity
corresponds quite well with the distance from the true partition. Finally, we
introduce a way to validate the partitionings with respect to external data
when the nodes are classified but the network structure is unknown. This is
here possible since we know everything of the computer generated networks, as
well as the historical answer to how the karate club and the football teams are
partitioned in reality. The partitioning of the gene network is validated by
use of the Gene Ontology database, where we show that a community in general
corresponds to a biological process.Comment: To appear in Physica A; 25 page
Making AI Meaningful Again
Artificial intelligence (AI) research enjoyed an initial period of enthusiasm in the 1970s and 80s. But this enthusiasm was tempered by a long interlude of frustration when genuinely useful AI applications failed to be forthcoming. Today, we are experiencing once again a period of enthusiasm, fired above all by the successes of the technology of deep neural networks or deep machine learning. In this paper we draw attention to what we take to be serious problems underlying current views of artificial intelligence encouraged by these successes, especially in the domain of language processing. We then show an alternative approach to language-centric AI, in which we identify a role for philosophy
- …