Search CORE

96 research outputs found

Sampling properties of directed networks

Author: A. Ntoulas
C. Christensen
D. E. Knuth
D. V. Foster
G. Bizhani
L. Becchetti
L. S. Buriol
M. Levene
M. Paczuski
P. Grassberger
S. K. Thomson
S.-W. Son
T. Wang
Publication venue: 'American Physical Society (APS)'
Publication date: 10/10/2012
Field of study

For many real-world networks only a small "sampled" version of the original network may be investigated; those results are then used to draw conclusions about the actual system. Variants of breadth-first search (BFS) sampling, which are based on epidemic processes, are widely used. Although it is well established that BFS sampling fails, in most cases, to capture the IN-component(s) of directed networks, a description of the effects of BFS sampling on other topological properties are all but absent from the literature. To systematically study the effects of sampling biases on directed networks, we compare BFS sampling to random sampling on complete large-scale directed networks. We present new results and a thorough analysis of the topological properties of seven different complete directed networks (prior to sampling), including three versions of Wikipedia, three different sources of sampled World Wide Web data, and an Internet-based social network. We detail the differences that sampling method and coverage can make to the structural properties of sampled versions of these seven networks. Most notably, we find that sampling method and coverage affect both the bow-tie structure, as well as the number and structure of strongly connected components in sampled networks. In addition, at low sampling coverage (i.e. less than 40%), the values of average degree, variance of out-degree, degree auto-correlation, and link reciprocity are overestimated by 30% or more in BFS-sampled networks, and only attain values within 10% of the corresponding values in the complete networks when sampling coverage is in excess of 65%. These results may cause us to rethink what we know about the structure, function, and evolution of real-world directed networks.Comment: 21 pages, 11 figure

arXiv.org e-Print Archive

HANYANG Repository

Crossref

Index ordering by query-independent measures

Author: Alan F. Smeaton
Amento
Anh
Anh
Anh
Baeza-Yates
Broder
Büttcher
Chakrabarti
Fagni
Ferguson
Garcia
Joachims
Joachims
Kleinberg
Moffat
Ntoulas
Park
Paul Ferguson
Persin
Plachouras
Robertson
Vapnik
Wang
Witten
Xue
Zhai
Zhang
Zipf
Publication venue: 'Elsevier BV'
Publication date: 01/05/2012
Field of study

Conventional approaches to information retrieval search through all applicable entries in an inverted file for a particular collection in order to find those documents with the highest scores. For particularly large collections this may be extremely time consuming. A solution to this problem is to only search a limited amount of the collection at query-time, in order to speed up the retrieval process. In doing this we can also limit the loss in retrieval efficacy (in terms of accuracy of results). The way we achieve this is to firstly identify the most “important” documents within the collection, and sort documents within inverted file lists in order of this “importance”. In this way we limit the amount of information to be searched at query time by eliminating documents of lesser importance, which not only makes the search more efficient, but also limits loss in retrieval accuracy. Our experiments, carried out on the TREC Terabyte collection, report significant savings, in terms of number of postings examined, without significant loss of effectiveness when based on several measures of importance used in isolation, and in combination. Our results point to several ways in which the computation cost of searching large collections of documents can be significantly reduced

Crossref

Irish Universities

DCU Online Research Access Service

Estimating the Quality of Postings in the Real-time Web

Author: KENTHAPADI Krishnaram
LAUW Hady W.
NTOULAS Alexandros
Publication venue
Publication date: 01/02/2010
Field of study

Institutional Knowledge at Singapore Management University

Organizing User Search Histories

Author: GETOOR Lise
HWANG Heasoo
LAUW Hady W.
NTOULAS Alexandros
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2012
Field of study

Institutional Knowledge at Singapore Management University

An Analysis of Time-Instability in Web Search Results

Author: A. Ntoulas
D. Fetterly
E. Adar
J. Teevan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Crossref

Homophily in the Digital World: A LiveJournal Case Study

Author: AGRAWAL Rakesh
LAUW Hady W.
NTOULAS Alexandros
SHAFER John C.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/03/2010
Field of study