Search CORE

55,316 research outputs found

Measuring gene similarity by means of the classification distance

Author: A Ben-Dor
A Statnikov
A Thalamuthu
Alessandro Fiori
BS Everitt
CC Chang
D Huang
D Jiang
D Jiang
Elena Baralis
FR Hampel
G Petrovics
Giulia Bruno
H Liu
J Gu
JJ Chen
JL Gregg
L Davies
L Fu
L Kaufman
L Wang
M Bouguessa
M Daszykowski
M Royuela
O Gevaert
P Rosini
P Yang
PR Bushel
RC Thompson
S Datta
S Mukkamala
SB Aicha
T Bo
T Chu
TF Cox
TR Golub
U Alon
WM Rand
X He
Y Torosyan
YH Yang
Publication venue: Springer London
Publication date: 01/01/2011
Field of study

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

Comparison and validation of community structures in complex networks

Author: Anna Lombardi
Ashburner
Azuaje
Bolshakova
Danon
Duch
Evans
Fisher
Girvan
Guimera
Gusfield
Jaccard
Maslov
Massen
Michael Hörnquist
Mika Gustafsson
Milligan
Newman
Newman
Newman
Newman
Rives
Rousseeuw
Stanley
Strehl
Zachary
Zhou
Publication venue: 'Elsevier BV'
Publication date: 10/01/2006
Field of study

The issue of partitioning a network into communities has attracted a great deal of attention recently. Most authors seem to equate this issue with the one of finding the maximum value of the modularity, as defined by Newman. Since the problem formulated this way is NP-hard, most effort has gone into the construction of search algorithms, and less to the question of other measures of community structures, similarities between various partitionings and the validation with respect to external information. Here we concentrate on a class of computer generated networks and on three well-studied real networks which constitute a bench-mark for network studies; the karate club, the US college football teams and a gene network of yeast. We utilize some standard ways of clustering data (originally not designed for finding community structures in networks) and show that these classical methods sometimes outperform the newer ones. We discuss various measures of the strength of the modular structure, and show by examples features and drawbacks. Further, we compare different partitions by applying some graph-theoretic concepts of distance, which indicate that one of the quality measures of the degree of modularity corresponds quite well with the distance from the true partition. Finally, we introduce a way to validate the partitionings with respect to external data when the nodes are classified but the network structure is unknown. This is here possible since we know everything of the computer generated networks, as well as the historical answer to how the karate club and the football teams are partitioned in reality. The partitioning of the gene network is validated by use of the Gene Ontology database, where we show that a community in general corresponds to a biological process.Comment: To appear in Physica A; 25 page

arXiv.org e-Print Archive

Crossref

CERN Document Server

Topological network alignment uncovers biological function and phylogeny

Author: Cook S.
Flannick J.
Kuchaiev O.
Kuchaiev O.
Memišević V.
Nataša Pržulj
Oleksii Kuchaiev
Pržulj N.
Singh R.
Singh R.
Snijders T. A.
Tijana Milenković
Vesna Memišević
Wayne Hayes
Wentz-Hunter K.
Zhang Y.
Publication venue
Publication date: 07/10/2009
Field of study

Sequence comparison and alignment has had an enormous impact on our understanding of evolution, biology, and disease. Comparison and alignment of biological networks will likely have a similar impact. Existing network alignments use information external to the networks, such as sequence, because no good algorithm for purely topological alignment has yet been devised. In this paper, we present a novel algorithm based solely on network topology, that can be used to align any two networks. We apply it to biological networks to produce by far the most complete topological alignments of biological networks to date. We demonstrate that both species phylogeny and detailed biological function of individual proteins can be extracted from our alignments. Topology-based alignments have the potential to provide a completely new, independent source of phylogenetic information. Our alignment of the protein-protein interaction networks of two very different species--yeast and human--indicate that even distant species share a surprising amount of network topology with each other, suggesting broad similarities in internal cellular wiring across all life on Earth.Comment: Algorithm explained in more details. Additional analysis adde

arXiv.org e-Print Archive

Crossref

PubMed Central

UCL Discovery

Exact heat kernel on a hypersphere and its applications in kernel SVM

Author: Song Jun S.
Zhao Chenchao
Publication venue: 'Frontiers Media SA'
Publication date: 19/11/2017
Field of study

Many contemporary statistical learning methods assume a Euclidean feature space. This paper presents a method for defining similarity based on hyperspherical geometry and shows that it often improves the performance of support vector machine compared to other competing similarity measures. Specifically, the idea of using heat diffusion on a hypersphere to measure similarity has been previously proposed, demonstrating promising results based on a heuristic heat kernel obtained from the zeroth order parametrix expansion; however, how well this heuristic kernel agrees with the exact hyperspherical heat kernel remains unknown. This paper presents a higher order parametrix expansion of the heat kernel on a unit hypersphere and discusses several problems associated with this expansion method. We then compare the heuristic kernel with an exact form of the heat kernel expressed in terms of a uniformly and absolutely convergent series in high-dimensional angular momentum eigenmodes. Being a natural measure of similarity between sample points dwelling on a hypersphere, the exact kernel often shows superior performance in kernel SVM classifications applied to text mining, tumor somatic mutation imputation, and stock market analysis

arXiv.org e-Print Archive

Frontiers - Publisher Connector

Clustering Time Series from Mixture Polynomial Models with Discretised Data

Author: Bagnall AJ
Janacek GJ
Zhang M
Publication venue: University of East Anglia
Publication date: 01/01/2003
Field of study

Clustering time series is an active research area with applications in many fields. One common feature of time series is the likely presence of outliers. These uncharacteristic data can significantly effect the quality of clusters formed. This paper evaluates a method of over-coming the detrimental effects of outliers. We describe some of the alternative approaches to clustering time series, then specify a particular class of model for experimentation with k-means clustering and a correlation based distance metric. For data derived from this class of model we demonstrate that discretising the data into a binary series of above and below the median improves the clustering when the data has outliers. More specifically, we show that firstly discretisation does not significantly effect the accuracy of the clusters when there are no outliers and secondly it significantly increases the accuracy in the presence of outliers, even when the probability of outlier is very low

University of East Anglia digital repository