Search CORE

4,437 research outputs found

The Archive Query Log: Mining Millions of Search Result Pages of Hundreds of Search Engines from 25 Years of Web Archives

Author: Fröbe Maik
Gienapp Lukas
Hagen Matthias
Potthast Martin
Reimer Jan Heinrich
Scells Harrisen
Schmidt Sebastian
Stein Benno
Publication venue
Publication date: 31/07/2023
Field of study

The Archive Query Log (AQL) is a previously unused, comprehensive query log collected at the Internet Archive over the last 25 years. Its first version includes 356 million queries, 166 million search result pages, and 1.7 billion search results across 550 search providers. Although many query logs have been studied in the literature, the search providers that own them generally do not publish their logs to protect user privacy and vital business data. Of the few query logs publicly available, none combines size, scope, and diversity. The AQL is the first to do so, enabling research on new retrieval models and (diachronic) search engine analyses. Provided in a privacy-preserving manner, it promotes open research as well as more transparency and accountability in the search industry.Comment: SIGIR 2023 resource paper, 13 page

arXiv.org e-Print Archive

Conflict and Computation on Wikipedia: a Finite-State Machine Analysis of Editor Interactions

Author: DeDeo Simon
Publication venue: 'MDPI AG'
Publication date: 01/07/2016
Field of study

What is the boundary between a vigorous argument and a breakdown of relations? What drives a group of individuals across it? Taking Wikipedia as a test case, we use a hidden Markov model to approximate the computational structure and social grammar of more than a decade of cooperation and conflict among its editors. Across a wide range of pages, we discover a bursty war/peace structure where the systems can become trapped, sometimes for months, in a computational subspace associated with significantly higher levels of conflict-tracking "revert" actions. Distinct patterns of behavior characterize the lower-conflict subspace, including tit-for-tat reversion. While a fraction of the transitions between these subspaces are associated with top-down actions taken by administrators, the effects are weak. Surprisingly, we find no statistical signal that transitions are associated with the appearance of particularly anti-social users, and only weak association with significant news events outside the system. These findings are consistent with transitions being driven by decentralized processes with no clear locus of control. Models of belief revision in the presence of a common resource for information-sharing predict the existence of two distinct phases: a disordered high-conflict phase, and a frozen phase with spontaneously-broken symmetry. The bistability we observe empirically may be a consequence of editor turn-over, which drives the system to a critical point between them.Comment: 23 pages, 3 figures. Matches published version. Code for HMM fitting available at http://bit.ly/sfihmm ; time series and derived finite state machines at bit.ly/wiki_hm

arXiv.org e-Print Archive

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Adaptive hypertext and hypermedia : proceedings of the 2nd workshop, Pittsburgh, Pa., June 20-24, 1998

Author
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/1998
Field of study

Pure OAI Repository

Adaptive hypertext and hypermedia : proceedings of the 2nd workshop, Pittsburgh, Pa., June 20-24, 1998

Author
Publication venue: Technische Universiteit Eindhoven
Publication date: 01/01/1998
Field of study

Pure OAI Repository

A Novel Distributed Privacy Paradigm for Visual Sensor Networks Based on Sharing Dynamical Systems

Author: Kundur Deepa
Luh William
Zourntos Takis
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2006
Field of study

Visual sensor networks (VSNs) provide surveillance images/video which must be protected from eavesdropping and tampering en route to the base station. In the spirit of sensor networks, we propose a novel paradigm for securing privacy and confidentiality in a distributed manner. Our paradigm is based on the control of dynamical systems, which we show is well suited for VSNs due to its low complexity in terms of processing and communication, while achieving robustness to both unintentional noise and intentional attacks as long as a small subset of nodes are affected. We also present a low complexity algorithm called TANGRAM to demonstrate the feasibility of applying our novel paradigm to VSNs. We present and discuss simulation results of TANGRAM

Springer - Publisher Connector

Directory of Open Access Journals

Texas A&M Repository

Expressing social attitudes in virtual agents for social training games

Author: Chollet Mathieu
Jones Hazaël
Ochs Magalie
Pelachaud Catherine
Sabouret Nicolas
Publication venue
Publication date: 20/02/2014
Field of study

The use of virtual agents in social coaching has increased rapidly in the last decade. In order to train the user in different situations than can occur in real life, the virtual agent should be able to express different social attitudes. In this paper, we propose a model of social attitudes that enables a virtual agent to reason on the appropriate social attitude to express during the interaction with a user given the course of the interaction, but also the emotions, mood and personality of the agent. Moreover, the model enables the virtual agent to display its social attitude through its non-verbal behaviour. The proposed model has been developed in the context of job interview simulation. The methodology used to develop such a model combined a theoretical and an empirical approach. Indeed, the model is based both on the literature in Human and Social Sciences on social attitudes but also on the analysis of an audiovisual corpus of job interviews and on post-hoc interviews with the recruiters on their expressed attitudes during the job interview

arXiv.org e-Print Archive

Interactivity And User-heterogeneity In On Demand Broadcast Video

Author: Tantaoui El Araki Mounir
Publication venue: University of Central Florida
Publication date: 01/01/2004
Field of study

Video-On-Demand (VOD) has appeared as an important technology for many multimedia applications such as news on demand, digital libraries, home entertainment, and distance learning. In its simplest form, delivery of a video stream requires a dedicated channel for each video session. This scheme is very expensive and non-scalable. To preserve server bandwidth, many users can share a channel using multicast. Two types of multicast have been considered. In a non-periodic multicast setting, users make video requests to the server; and it serves them according to some scheduling policy. In a periodic broadcast environment, the server does not wait for service requests. It broadcasts a video cyclically, e.g., a new stream of the same video is started every t seconds. Although, this type of approach does not guarantee true VOD, the worst service latency experienced by any client is less than t seconds. A distinct advantage of this approach is that it can serve a very large community of users using minimal server bandwidth. In VOD System it is desirable to provide the user with the video-cassette-recorder-like (VCR) capabilities such as fast-forwarding a video or jumping to a specific frame. This issue in the broadcast framework is addressed, where each video and its interactive version are broadcast repeatedly on the network. Existing techniques rely on data prefetching as the mechanism to provide this functionality. This approach provides limited usability since the prefetching rate cannot keep up with typical fast-forward speeds. In the same environment, end users might have access to different bandwidth capabilities at different times. Current periodic broadcast schemes, do not take advantage of high-bandwidth capabilities, nor do they adapt to the low-bandwidth limitation of the receivers. A heterogeneous technique is presented that can adapt to a range of receiving bandwidth capability. Given a server bandwidth and a range of different client bandwidths, users employing the proposed technique will choose either to use their full reception bandwidth capability and therefore accessing the video at a very short time, or using part or enough reception bandwidth at the expense of a longer access latency

University of Central Florida (UCF): STARS (Showcase of Text, Archives, Research & Scholarship)