54 research outputs found
Boilerplate Removal using a Neural Sequence Labeling Model
The extraction of main content from web pages is an important task for
numerous applications, ranging from usability aspects, like reader views for
news articles in web browsers, to information retrieval or natural language
processing. Existing approaches are lacking as they rely on large amounts of
hand-crafted features for classification. This results in models that are
tailored to a specific distribution of web pages, e.g. from a certain time
frame, but lack in generalization power. We propose a neural sequence labeling
model that does not rely on any hand-crafted features but takes only the HTML
tags and words that appear in a web page as input. This allows us to present a
browser extension which highlights the content of arbitrary web pages directly
within the browser using our model. In addition, we create a new, more current
dataset to show that our model is able to adapt to changes in the structure of
web pages and outperform the state-of-the-art model.Comment: WWW20 Demo pape
User Fairness in Recommender Systems
Recent works in recommendation systems have focused on diversity in
recommendations as an important aspect of recommendation quality. In this work
we argue that the post-processing algorithms aimed at only improving diversity
among recommendations lead to discrimination among the users. We introduce the
notion of user fairness which has been overlooked in literature so far and
propose measures to quantify it. Our experiments on two diversification
algorithms show that an increase in aggregate diversity results in increased
disparity among the users
High-Performance Reachability Query Processing under Index Size Restrictions
In this paper, we propose a scalable and highly efficient index structure for
the reachability problem over graphs. We build on the well-known node interval
labeling scheme where the set of vertices reachable from a particular node is
compactly encoded as a collection of node identifier ranges. We impose an
explicit bound on the size of the index and flexibly assign approximate
reachability ranges to nodes of the graph such that the number of index probes
to answer a query is minimized. The resulting tunable index structure generates
a better range labeling if the space budget is increased, thus providing a
direct control over the trade off between index size and the query processing
performance. By using a fast recursive querying method in conjunction with our
index structure, we show that in practice, reachability queries can be answered
in the order of microseconds on an off-the-shelf computer - even for the case
of massive-scale real world graphs. Our claims are supported by an extensive
set of experimental results using a multitude of benchmark and real-world
web-scale graph datasets.Comment: 30 page
- …