32,535 research outputs found
Implicit Measures of Lostness and Success in Web Navigation
In two studies, we investigated the ability of a variety of structural and temporal measures computed from a web navigation path to predict lostness and task success. The user’s task was to find requested target information on specified websites. The web navigation measures were based on counts of visits to web pages and other statistical properties of the web usage graph (such as compactness, stratum, and similarity to the optimal path). Subjective lostness was best predicted by similarity to the optimal path and time on task. The best overall predictor of success on individual tasks was similarity to the optimal path, but other predictors were sometimes superior depending on the particular web navigation task. These measures can be used to diagnose user navigational problems and to help identify problems in website design
Multilayer Network of Language: a Unified Framework for Structural Analysis of Linguistic Subsystems
Recently, the focus of complex networks research has shifted from the
analysis of isolated properties of a system toward a more realistic modeling of
multiple phenomena - multilayer networks. Motivated by the prosperity of
multilayer approach in social, transport or trade systems, we propose the
introduction of multilayer networks for language. The multilayer network of
language is a unified framework for modeling linguistic subsystems and their
structural properties enabling the exploration of their mutual interactions.
Various aspects of natural language systems can be represented as complex
networks, whose vertices depict linguistic units, while links model their
relations. The multilayer network of language is defined by three aspects: the
network construction principle, the linguistic subsystem and the language of
interest. More precisely, we construct a word-level (syntax, co-occurrence and
its shuffled counterpart) and a subword level (syllables and graphemes) network
layers, from five variations of original text (in the modeled language). The
obtained results suggest that there are substantial differences between the
networks structures of different language subsystems, which are hidden during
the exploration of an isolated layer. The word-level layers share structural
properties regardless of the language (e.g. Croatian or English), while the
syllabic subword level expresses more language dependent structural properties.
The preserved weighted overlap quantifies the similarity of word-level layers
in weighted and directed networks. Moreover, the analysis of motifs reveals a
close topological structure of the syntactic and syllabic layers for both
languages. The findings corroborate that the multilayer network framework is a
powerful, consistent and systematic approach to model several linguistic
subsystems simultaneously and hence to provide a more unified view on language
Exploring the benefit of rerouting multi-period traffic to multi-site data centers
In cloud-like scenarios, demand is served at one of multiple possible data center (DC) destinations. Usually, the exact DC that is used can be freely chosen, which leads to an anycast routing problem. Furthermore, the demand volume is expected to change over time, e.g., following a diurnal pattern. Given that virtually all application domains today rely heavily on cloud-like services, it is important that the backbone networks connecting users to the DCs are resilient against failures. In this paper, we consider the problem of resiliently routing multi-period traffic: we need to find routes to both a primary DC and a backup DC (to be used in the case of failure of the primary one, or of the network connection to it), and also account for synchronization traffic between the primary and backup DCs. We formulate this as an optimization problem and adopt column generation, using a path formulation in two sub-problems: the (restricted) master problem selects "configurations" to use for each demand in each of the time epochs it lasts, while the pricing problem (PP) constructs a new "configuration" that can lead to lower overall costs (which we express as the number of network resources, i.e., bandwidth, required to serve the demand). Here, a "configuration" is defined by the network paths followed from the demand source to each of the two selected DCs, as well as that of the synchronization traffic in between the DCs. Our decomposition allows for PPs to be solved in parallel, for which we quantitatively explore the reduction in the time required to solve the overall routing problem. The key question that we address with our model is an exploration of the potential benefits of rerouting traffic from one time epoch to the next: we compare several (re) routing strategies, allowing traffic that spans multiple time periods to i) not be rerouted in different periods, ii) only change the backup DC and routes, or iii) freely change both primary and backup DC choices and the routes toward them
Global disease monitoring and forecasting with Wikipedia
Infectious disease is a leading threat to public health, economic stability,
and other key social structures. Efforts to mitigate these impacts depend on
accurate and timely monitoring to measure the risk and progress of disease.
Traditional, biologically-focused monitoring techniques are accurate but costly
and slow; in response, new techniques based on social internet data such as
social media and search queries are emerging. These efforts are promising, but
important challenges in the areas of scientific peer review, breadth of
diseases and countries, and forecasting hamper their operational usefulness.
We examine a freely available, open data source for this use: access logs
from the online encyclopedia Wikipedia. Using linear models, language as a
proxy for location, and a systematic yet simple article selection procedure, we
tested 14 location-disease combinations and demonstrate that these data
feasibly support an approach that overcomes these challenges. Specifically, our
proof-of-concept yields models with up to 0.92, forecasting value up to
the 28 days tested, and several pairs of models similar enough to suggest that
transferring models from one location to another without re-training is
feasible.
Based on these preliminary results, we close with a research agenda designed
to overcome these challenges and produce a disease monitoring and forecasting
system that is significantly more effective, robust, and globally comprehensive
than the current state of the art.Comment: 27 pages; 4 figures; 4 tables. Version 2: Cite McIver & Brownstein
and adjust novelty claims accordingly; revise title; various revisions for
clarit
The power of indirect social ties
While direct social ties have been intensely studied in the context of
computer-mediated social networks, indirect ties (e.g., friends of friends)
have seen little attention. Yet in real life, we often rely on friends of our
friends for recommendations (of good doctors, good schools, or good
babysitters), for introduction to a new job opportunity, and for many other
occasional needs. In this work we attempt to 1) quantify the strength of
indirect social ties, 2) validate it, and 3) empirically demonstrate its
usefulness for distributed applications on two examples. We quantify social
strength of indirect ties using a(ny) measure of the strength of the direct
ties that connect two people and the intuition provided by the sociology
literature. We validate the proposed metric experimentally by comparing
correlations with other direct social tie evaluators. We show via data-driven
experiments that the proposed metric for social strength can be used
successfully for social applications. Specifically, we show that it alleviates
known problems in friend-to-friend storage systems by addressing two previously
documented shortcomings: reduced set of storage candidates and data
availability correlations. We also show that it can be used for predicting the
effects of a social diffusion with an accuracy of up to 93.5%.Comment: Technical Repor
- …