25,295 research outputs found
Learning about Users from Observation
Many approaches and systems for recommending information, goods, or other kinds of objects have been developed in recent years. In these systems, machine learning methods are often used that need training input to acquire a user interest profile. Such methods typically need positive and negative evidence of the user’s interests. To obtain both kinds of evidence, many systems make users rate relevant objects explicitly. Others merely observe the user’s behavior, which yields positive evidence only; in order to be able to apply the standard learning methods, these systems mostly use heuristics to also find negative evidence in observed behavior.
In this paper, we present an approach for learning interest profiles from positive evidence only, as it is contained in observed user behavior. Thus, both the problem of interrupting the user for ratings and the problem of somewhat artificially determining negative evidence are avoided.
A methodology for learning explicit user profiles and recommending interesting objects has been developed. It is used in the context of ELFI – a Web-based information system. The evaluation results are briefly described in this paper.
Our current efforts revolve around further improvements of the methodology and its implementation for recommending interesting web pages to users of a web browser
A Methodology for Information Flow Experiments
Information flow analysis has largely ignored the setting where the analyst
has neither control over nor a complete model of the analyzed system. We
formalize such limited information flow analyses and study an instance of it:
detecting the usage of data by websites. We prove that these problems are ones
of causal inference. Leveraging this connection, we push beyond traditional
information flow analysis to provide a systematic methodology based on
experimental science and statistical analysis. Our methodology allows us to
systematize prior works in the area viewing them as instances of a general
approach. Our systematic study leads to practical advice for improving work on
detecting data usage, a previously unformalized area. We illustrate these
concepts with a series of experiments collecting data on the use of information
by websites, which we statistically analyze
Privacy Issues of the W3C Geolocation API
The W3C's Geolocation API may rapidly standardize the transmission of
location information on the Web, but, in dealing with such sensitive
information, it also raises serious privacy concerns. We analyze the manner and
extent to which the current W3C Geolocation API provides mechanisms to support
privacy. We propose a privacy framework for the consideration of location
information and use it to evaluate the W3C Geolocation API, both the
specification and its use in the wild, and recommend some modifications to the
API as a result of our analysis
Is the Web ready for HTTP/2 Server Push?
HTTP/2 supersedes HTTP/1.1 to tackle the performance challenges of the modern
Web. A highly anticipated feature is Server Push, enabling servers to send data
without explicit client requests, thus potentially saving time. Although
guidelines on how to use Server Push emerged, measurements have shown that it
can easily be used in a suboptimal way and hurt instead of improving
performance. We thus tackle the question if the current Web can make better use
of Server Push. First, we enable real-world websites to be replayed in a
testbed to study the effects of different Server Push strategies. Using this,
we next revisit proposed guidelines to grasp their performance impact. Finally,
based on our results, we propose a novel strategy using an alternative server
scheduler that enables to interleave resources. This improves the visual
progress for some websites, with minor modifications to the deployment. Still,
our results highlight the limits of Server Push: a deep understanding of web
engineering is required to make optimal use of it, and not every site will
benefit.Comment: More information available at https://push.netray.i
The Network Effects of Prefetching
Prefetching has been shown to be an effective technique for reducing user perceived latency in distributed systems. In this paper we show that even when prefetching adds no extra traffic to the network, it can have serious negative performance effects. Straightforward approaches to prefetching increase the burstiness of individual sources, leading to increased average queue sizes in network switches. However, we also show that applications can avoid the undesirable queueing effects of prefetching. In fact, we show that applications employing prefetching can significantly improve network performance, to a level much better than that obtained without any prefetching at all. This is because prefetching offers increased opportunities for traffic shaping that are not available in the absence of prefetching. Using a simple transport rate control mechanism, a prefetching application can modify its behavior from a distinctly ON/OFF entity to one whose data transfer rate changes less abruptly, while still delivering all data in advance of the user's actual requests
Traffic measurement and analysis
Measurement and analysis of real traffic is important to gain knowledge
about the characteristics of the traffic. Without measurement, it is
impossible to build realistic traffic models. It is recent that data
traffic was found to have self-similar properties. In this thesis work
traffic captured on the network at SICS and on the Supernet, is shown to
have this fractal-like behaviour. The traffic is also examined with
respect to which protocols and packet sizes are present and in what
proportions. In the SICS trace most packets are small, TCP is shown to be
the predominant transport protocol and NNTP the most common application.
In contrast to this, large UDP packets sent between not well-known ports
dominates the Supernet traffic. Finally, characteristics of the client
side of the WWW traffic are examined more closely. In order to extract
useful information from the packet trace, web browsers use of TCP and HTTP
is investigated including new features in HTTP/1.1 such as persistent
connections and pipelining. Empirical probability distributions are
derived describing session lengths, time between user clicks and the
amount of data transferred due to a single user click. These probability
distributions make up a simple model of WWW-sessions
Experiences of aiding autobiographical memory using the sensecam
Human memory is a dynamic system that makes accessible certain memories of events based on a hierarchy of information, arguably driven by personal significance. Not all events are remembered, but those that are tend to be more psychologically relevant. In contrast, lifelogging is the process of automatically recording aspects of one's life in digital form without loss of information. In this article we share our experiences in designing computer-based solutions to assist people review their visual lifelogs and address this contrast. The technical basis for our work is automatically segmenting visual lifelogs into events, allowing event similarity and event importance to be computed, ideas that are motivated by cognitive science considerations of how human memory works and can be assisted. Our work has been based on visual lifelogs gathered by dozens of people, some of them with collections spanning multiple years. In this review article we summarize a series of studies that have led to the development of a browser that is based on human memory systems and discuss the inherent tension in storing large amounts of data but making the most relevant material the most accessible
PrivacyScore: Improving Privacy and Security via Crowd-Sourced Benchmarks of Websites
Website owners make conscious and unconscious decisions that affect their
users, potentially exposing them to privacy and security risks in the process.
In this paper we introduce PrivacyScore, an automated website scanning portal
that allows anyone to benchmark security and privacy features of multiple
websites. In contrast to existing projects, the checks implemented in
PrivacyScore cover a wider range of potential privacy and security issues.
Furthermore, users can control the ranking and analysis methodology. Therefore,
PrivacyScore can also be used by data protection authorities to perform
regularly scheduled compliance checks. In the long term we hope that the
transparency resulting from the published benchmarks creates an incentive for
website owners to improve their sites. The public availability of a first
version of PrivacyScore was announced at the ENISA Annual Privacy Forum in June
2017.Comment: 14 pages, 4 figures. A german version of this paper discussing the
legal aspects of this system is available at arXiv:1705.0888
- …