Search CORE

4,556 research outputs found

Extracting tag hierarchies

Author: Palla Gergely
Pollner Péter
Tibély Gergely
Vicsek Tamás
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2013
Field of study

Tagging items with descriptive annotations or keywords is a very natural way to compress and highlight information about the properties of the given entity. Over the years several methods have been proposed for extracting a hierarchy between the tags for systems with a "flat", egalitarian organization of the tags, which is very common when the tags correspond to free words given by numerous independent people. Here we present a complete framework for automated tag hierarchy extraction based on tag occurrence statistics. Along with proposing new algorithms, we are also introducing different quality measures enabling the detailed comparison of competing approaches from different aspects. Furthermore, we set up a synthetic, computer generated benchmark providing a versatile tool for testing, with a couple of tunable parameters capable of generating a wide range of test beds. Beside the computer generated input we also use real data in our studies, including a biological example with a pre-defined hierarchy between the tags. The encouraging similarity between the pre-defined and reconstructed hierarchy, as well as the seemingly meaningful hierarchies obtained for other real systems indicate that tag hierarchy extraction is a very promising direction for further research with a great potential for practical applications.Comment: 25 pages with 21 pages of supporting information, 25 figure

arXiv.org e-Print Archive

Directory of Open Access Journals

PubMed Central

ELTE Digital Institutional Repository (EDIT)

FigShare

Statistical mechanics of ontology based annotations

Author: Brass Andrew
Hoyle David C.
Publication venue: 'Elsevier BV'
Publication date: 15/01/2016
Field of study

We present a statistical mechanical theory of the process of annotating an object with terms selected from an ontology. The term selection process is formulated as an ideal lattice gas model, but in a highly structured inhomogeneous field. The model enables us to explain patterns recently observed in real-world annotation data sets, in terms of the underlying graph structure of the ontology. By relating the external field strengths to the information content of each node in the ontology graph, the statistical mechanical model also allows us to propose a number of practical metrics for assessing the quality of both the ontology, and the annotations that arise from its use. Using the statistical mechanical formalism we also study an ensemble of ontologies of differing size and complexity; an analysis not readily performed using real data alone. Focusing on regular tree ontology graphs we uncover a rich set of scaling laws describing the growth in the optimal ontology size as the number of objects being annotated increases. In doing so we provide a further possible measure for assessment of ontologies.Comment: 27 pages, 5 figure

arXiv.org e-Print Archive

The University of Manchester - Institutional Repository

Comparing the hierarchy of author given tags and repository given tags in a large document archive

Author: Palla Gergely
Pollner Péter
Tibély Gergely
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 30/06/2015
Field of study

Folksonomies - large databases arising from collaborative tagging of items by independent users - are becoming an increasingly important way of categorizing information. In these systems users can tag items with free words, resulting in a tripartite item-tag-user network. Although there are no prescribed relations between tags, the way users think about the different categories presumably has some built in hierarchy, in which more special concepts are descendants of some more general categories. Several applications would benefit from the knowledge of this hierarchy. Here we apply a recent method to check the differences and similarities of hierarchies resulting from tags given by independent individuals and from tags given by a centrally managed repository system. The results from out method showed substantial differences between the lower part of the hierarchies, and in contrast, a relatively high similarity at the top of the hierarchies.Comment: 10 page

arXiv.org e-Print Archive

EDP Sciences OAI-PMH repository (1.2.0)

Comparing the hierarchy of keywords in on-line news portals

Author: A Clauset
A Trusina
AL Barabási
B Corominas-Murtra
B Corominas-Murtra
C Cattuto
C Cattuto
C Goessmann
CV Damme
D Czégel
D Pumain
David Sousa-Rodrigues
DW McShea
E Mones
E Ravasz
ET Wimberley
F Floeck
FJ Brandenburg
G Ghosal
G Palla
G Tibély
G Tibély
Gergely Palla
Gergely Tibély
H Fushing
H Hirata
HW Ma
J Wickens
JI Perotti
K Juszczyszyn
L Lu
M Batty
M Fattore
M Kaiser
M Nagy
M Nagy
N Eldredge
P Heymann
P Mika
P Pollner
P Spyns
Peter Csermely
PR Krugman
Péter Pollner
R Guimerà
R Lambiotte
S Valverde
SN Dorogovtsev
V Zlatić
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2016
Field of study

The tagging of on-line content with informative keywords is a widespread phenomenon from scientific article repositories through blogs to on-line news portals. In most of the cases, the tags on a given item are free words chosen by the authors independently. Therefore, relations among keywords in a collection of news items is unknown. However, in most cases the topics and concepts described by these keywords are forming a latent hierarchy, with the more general topics and categories at the top, and more specialised ones at the bottom. Here we apply a recent, cooccurrence-based tag hierarchy extraction method to sets of keywords obtained from four different on-line news portals. The resulting hierarchies show substantial differences not just in the topics rendered as important (being at the top of the hierarchy) or of less interest (categorised low in the hierarchy), but also in the underlying network structure. This reveals discrepancies between the plausible keyword association frameworks in the studied news portals

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

PubMed Central

ELTE Digital Institutional Repository (EDIT)

FigShare

Clustering of tag-induced sub-graphs in complex networks

Author: Aittokallio
Albert
Barabási
Barrat
Benczik
Capocci
Capocci
Castellano
Cattuto
Cattuto
Eguíluz
Ehrhardt
Erdős
Faloutsos
Finocchiaro
Fortunato
Gergely Palla
Ghosal
Gil
Girvan
Holme
Jonsson
Jonsson
Kossinets
Kozma
Lambiotte
Mason
Mendes
Mewes
Newman
Newman
Palla
Palla
Péter Pollner
Ravasz
Resnik
Tamás Vicsek
Vazquez
Vazquez
Watts
Zhu
Zimmermann
Zlatić
Zlatić
Publication venue: 'Elsevier BV'
Publication date: 30/05/2012
Field of study

We study the behavior of the clustering coefficient in tagged networks. The rich variety of tags associated with the nodes in the studied systems provide additional information about the entities represented by the nodes which can be important for practical applications like searching in the networks. Here we examine how the clustering coefficient changes when narrowing the network to a sub-graph marked by a given tag, and how does it correlate with various other properties of the sub-graph. Another interesting question addressed in the paper is how the clustering coefficient of the individual nodes is affected by the tags on the node. We believe these sort of analysis help acquiring a more complete description of the structure of large complex systems

arXiv.org e-Print Archive

Crossref

Information-theoretic inference of common ancestors

Author: Ay Nihat
Steudel Bastian
Publication venue
Publication date: 01/01/2010
Field of study

A directed acyclic graph (DAG) partially represents the conditional independence structure among observations of a system if the local Markov condition holds, that is, if every variable is independent of its non-descendants given its parents. In general, there is a whole class of DAGs that represents a given set of conditional independence relations. We are interested in properties of this class that can be derived from observations of a subsystem only. To this end, we prove an information theoretic inequality that allows for the inference of common ancestors of observed parts in any DAG representing some unknown larger system. More explicitly, we show that a large amount of dependence in terms of mutual information among the observations implies the existence of a common ancestor that distributes this information. Within the causal interpretation of DAGs our result can be seen as a quantitative extension of Reichenbach's Principle of Common Cause to more than two variables. Our conclusions are valid also for non-probabilistic observations such as binary strings, since we state the proof for an axiomatized notion of mutual information that includes the stochastic as well as the algorithmic version.Comment: 18 pages, 4 figure

arXiv.org e-Print Archive

CiteSeerX