4,437 research outputs found
The Archive Query Log: Mining Millions of Search Result Pages of Hundreds of Search Engines from 25 Years of Web Archives
The Archive Query Log (AQL) is a previously unused, comprehensive query log
collected at the Internet Archive over the last 25 years. Its first version
includes 356 million queries, 166 million search result pages, and 1.7 billion
search results across 550 search providers. Although many query logs have been
studied in the literature, the search providers that own them generally do not
publish their logs to protect user privacy and vital business data. Of the few
query logs publicly available, none combines size, scope, and diversity. The
AQL is the first to do so, enabling research on new retrieval models and
(diachronic) search engine analyses. Provided in a privacy-preserving manner,
it promotes open research as well as more transparency and accountability in
the search industry.Comment: SIGIR 2023 resource paper, 13 page
Conflict and Computation on Wikipedia: a Finite-State Machine Analysis of Editor Interactions
What is the boundary between a vigorous argument and a breakdown of
relations? What drives a group of individuals across it? Taking Wikipedia as a
test case, we use a hidden Markov model to approximate the computational
structure and social grammar of more than a decade of cooperation and conflict
among its editors. Across a wide range of pages, we discover a bursty war/peace
structure where the systems can become trapped, sometimes for months, in a
computational subspace associated with significantly higher levels of
conflict-tracking "revert" actions. Distinct patterns of behavior characterize
the lower-conflict subspace, including tit-for-tat reversion. While a fraction
of the transitions between these subspaces are associated with top-down actions
taken by administrators, the effects are weak. Surprisingly, we find no
statistical signal that transitions are associated with the appearance of
particularly anti-social users, and only weak association with significant news
events outside the system. These findings are consistent with transitions being
driven by decentralized processes with no clear locus of control. Models of
belief revision in the presence of a common resource for information-sharing
predict the existence of two distinct phases: a disordered high-conflict phase,
and a frozen phase with spontaneously-broken symmetry. The bistability we
observe empirically may be a consequence of editor turn-over, which drives the
system to a critical point between them.Comment: 23 pages, 3 figures. Matches published version. Code for HMM fitting
available at http://bit.ly/sfihmm ; time series and derived finite state
machines at bit.ly/wiki_hm
A Novel Distributed Privacy Paradigm for Visual Sensor Networks Based on Sharing Dynamical Systems
Visual sensor networks (VSNs) provide surveillance images/video which must be protected from eavesdropping and tampering en route to the base station. In the spirit of sensor networks, we propose a novel paradigm for securing privacy and confidentiality in a distributed manner. Our paradigm is based on the control of dynamical systems, which we show is well suited for VSNs due to its low complexity in terms of processing and communication, while achieving robustness to both unintentional noise and intentional attacks as long as a small subset of nodes are affected. We also present a low complexity algorithm called TANGRAM to demonstrate the feasibility of applying our novel paradigm to VSNs. We present and discuss simulation results of TANGRAM
Expressing social attitudes in virtual agents for social training games
The use of virtual agents in social coaching has increased rapidly in the
last decade. In order to train the user in different situations than can occur
in real life, the virtual agent should be able to express different social
attitudes. In this paper, we propose a model of social attitudes that enables a
virtual agent to reason on the appropriate social attitude to express during
the interaction with a user given the course of the interaction, but also the
emotions, mood and personality of the agent. Moreover, the model enables the
virtual agent to display its social attitude through its non-verbal behaviour.
The proposed model has been developed in the context of job interview
simulation. The methodology used to develop such a model combined a theoretical
and an empirical approach. Indeed, the model is based both on the literature in
Human and Social Sciences on social attitudes but also on the analysis of an
audiovisual corpus of job interviews and on post-hoc interviews with the
recruiters on their expressed attitudes during the job interview
Interactivity And User-heterogeneity In On Demand Broadcast Video
Video-On-Demand (VOD) has appeared as an important technology for many multimedia applications such as news on demand, digital libraries, home entertainment, and distance learning. In its simplest form, delivery of a video stream requires a dedicated channel for each video session. This scheme is very expensive and non-scalable. To preserve server bandwidth, many users can share a channel using multicast. Two types of multicast have been considered. In a non-periodic multicast setting, users make video requests to the server; and it serves them according to some scheduling policy. In a periodic broadcast environment, the server does not wait for service requests. It broadcasts a video cyclically, e.g., a new stream of the same video is started every t seconds. Although, this type of approach does not guarantee true VOD, the worst service latency experienced by any client is less than t seconds. A distinct advantage of this approach is that it can serve a very large community of users using minimal server bandwidth. In VOD System it is desirable to provide the user with the video-cassette-recorder-like (VCR) capabilities such as fast-forwarding a video or jumping to a specific frame. This issue in the broadcast framework is addressed, where each video and its interactive version are broadcast repeatedly on the network. Existing techniques rely on data prefetching as the mechanism to provide this functionality. This approach provides limited usability since the prefetching rate cannot keep up with typical fast-forward speeds. In the same environment, end users might have access to different bandwidth capabilities at different times. Current periodic broadcast schemes, do not take advantage of high-bandwidth capabilities, nor do they adapt to the low-bandwidth limitation of the receivers. A heterogeneous technique is presented that can adapt to a range of receiving bandwidth capability. Given a server bandwidth and a range of different client bandwidths, users employing the proposed technique will choose either to use their full reception bandwidth capability and therefore accessing the video at a very short time, or using part or enough reception bandwidth at the expense of a longer access latency
- …