2,321 research outputs found
Small-World File-Sharing Communities
Web caches, content distribution networks, peer-to-peer file sharing
networks, distributed file systems, and data grids all have in common that they
involve a community of users who generate requests for shared data. In each
case, overall system performance can be improved significantly if we can first
identify and then exploit interesting structure within a community's access
patterns. To this end, we propose a novel perspective on file sharing based on
the study of the relationships that form among users based on the files in
which they are interested.
We propose a new structure that captures common user interests in data--the
data-sharing graph-- and justify its utility with studies on three
data-distribution systems: a high-energy physics collaboration, the Web, and
the Kazaa peer-to-peer network. We find small-world patterns in the
data-sharing graphs of all three communities. We analyze these graphs and
propose some probable causes for these emergent small-world patterns. The
significance of small-world patterns is twofold: it provides a rigorous support
to intuition and, perhaps most importantly, it suggests ways to design
mechanisms that exploit these naturally emerging patterns
Event detection and user interest discovering in social media data streams
Social media plays an increasingly important role in people’s life. Microblogging is a form of social media which allows people to share and disseminate real-life events. Broadcasting events in microblogging networks can be an effective method of creating awareness, divulging important information and so on. However, many existing approaches at dissecting the information content primarily discuss the event detection model and ignore the user interest which can be discovered during event evolution. This leads to difficulty in tracking the most important events as they evolve including identifying the influential spreaders. There is further complication given that the influential spreaders interests will also change during event evolution. The influential spreaders play a key role in event evolution and this has been largely ignored in traditional event detection methods. To this end, we propose a user-interest model based event evolution model, named the HEE (Hot Event Evolution) model. This model not only considers the user interest distribution, but also uses the short text data in the social network to model the posts and the recommend methods to discovering the user interests. This can resolve the problem of data sparsity, as exemplified by many existing event detection methods, and improve the accuracy of event detection. A hot event automatic filtering algorithm is initially applied to remove the influence of general events, improving the quality and efficiency of mining the event. Then an automatic topic clustering algorithm is applied to arrange the short texts into clusters with similar topics. An improved user-interest model is proposed to combine the short texts of each cluster into a long text document simplifying the determination of the overall topic in relation to the interest distribution of each user during the evolution of important events. Finally a novel cosine measure based event similarity detection method is used to assess correlation between events thereby detecting the process of event evolution. The experimental results on a real Twitter dataset demonstrate the efficiency and accuracy of our proposed model for both event detection and user interest discovery during the evolution of hot events.N/
Towards a Theory of Scale-Free Graphs: Definition, Properties, and Implications (Extended Version)
Although the ``scale-free'' literature is large and growing, it gives neither
a precise definition of scale-free graphs nor rigorous proofs of many of their
claimed properties. In fact, it is easily shown that the existing theory has
many inherent contradictions and verifiably false claims. In this paper, we
propose a new, mathematically precise, and structural definition of the extent
to which a graph is scale-free, and prove a series of results that recover many
of the claimed properties while suggesting the potential for a rich and
interesting theory. With this definition, scale-free (or its opposite,
scale-rich) is closely related to other structural graph properties such as
various notions of self-similarity (or respectively, self-dissimilarity).
Scale-free graphs are also shown to be the likely outcome of random
construction processes, consistent with the heuristic definitions implicit in
existing random graph approaches. Our approach clarifies much of the confusion
surrounding the sensational qualitative claims in the scale-free literature,
and offers rigorous and quantitative alternatives.Comment: 44 pages, 16 figures. The primary version is to appear in Internet
Mathematics (2005
Finding Person Relations in Image Data of the Internet Archive
The multimedia content in the World Wide Web is rapidly growing and contains
valuable information for many applications in different domains. For this
reason, the Internet Archive initiative has been gathering billions of
time-versioned web pages since the mid-nineties. However, the huge amount of
data is rarely labeled with appropriate metadata and automatic approaches are
required to enable semantic search. Normally, the textual content of the
Internet Archive is used to extract entities and their possible relations
across domains such as politics and entertainment, whereas image and video
content is usually neglected. In this paper, we introduce a system for person
recognition in image content of web news stored in the Internet Archive. Thus,
the system complements entity recognition in text and allows researchers and
analysts to track media coverage and relations of persons more precisely. Based
on a deep learning face recognition approach, we suggest a system that
automatically detects persons of interest and gathers sample material, which is
subsequently used to identify them in the image data of the Internet Archive.
We evaluate the performance of the face recognition system on an appropriate
standard benchmark dataset and demonstrate the feasibility of the approach with
two use cases
- …