1,892 research outputs found
Boilerplate Removal using a Neural Sequence Labeling Model
The extraction of main content from web pages is an important task for
numerous applications, ranging from usability aspects, like reader views for
news articles in web browsers, to information retrieval or natural language
processing. Existing approaches are lacking as they rely on large amounts of
hand-crafted features for classification. This results in models that are
tailored to a specific distribution of web pages, e.g. from a certain time
frame, but lack in generalization power. We propose a neural sequence labeling
model that does not rely on any hand-crafted features but takes only the HTML
tags and words that appear in a web page as input. This allows us to present a
browser extension which highlights the content of arbitrary web pages directly
within the browser using our model. In addition, we create a new, more current
dataset to show that our model is able to adapt to changes in the structure of
web pages and outperform the state-of-the-art model.Comment: WWW20 Demo pape
SMART-KG: Hybrid Shipping for SPARQL Querying on the Web
While Linked Data (LD) provides standards for publishing (RDF) and (SPARQL) querying Knowledge Graphs (KGs) on the Web, serving, accessing and processing such open, decentralized KGs is often practically impossible, as query timeouts on publicly available SPARQL endpoints show. Alternative solutions such as Triple Pattern Fragments (TPF) attempt to tackle the problem of availability by pushing query processing workload to the client side, but suffer from unnecessary transfer of irrelevant data on complex queries with large intermediate results. In this paper we present smart-KG, a novel approach to share the load between servers and clients, while significantly reducing data transfer volume, by combining TPF with shipping compressed KG partitions. Our evaluations show that outperforms state-of-the-art client-side solutions and increases server-side availability towards more cost-effective and balanced hosting of open and decentralized KGs.Series: Working Papers on Information Systems, Information Business and Operation
Layered Graph Embedding for Entity Recommendation using Wikipedia in the Yahoo! Knowledge Graph
In this paper, we describe an embedding-based entity recommendation framework
for Wikipedia that organizes Wikipedia into a collection of graphs layered on
top of each other, learns complementary entity representations from their
topology and content, and combines them with a lightweight learning-to-rank
approach to recommend related entities on Wikipedia. Through offline and online
evaluations, we show that the resulting embeddings and recommendations perform
well in terms of quality and user engagement. Balancing simplicity and quality,
this framework provides default entity recommendations for English and other
languages in the Yahoo! Knowledge Graph, which Wikipedia is a core subset of.Comment: 8 pages, 4 figures, 8 tables. To be appeared in Wiki Workshop 2020,
Companion Proceedings of the Web Conference 2020(WWW 20 Companion), Taipei,
Taiwa
Recommended from our members
Towards Complete Decentralised Verification of Data with Confidentiality: Different ways to connect Solid Pods and Blockchain
Over-centralisation of data leads to tampering and sharing user information without the consent of the owners. This problem has been studied extensively in recent times providing separate solutions involving distributed storage, Blockchain technology and Solid Pods. Individually these solutions are not sufficient to build realistic applications in a decentralised environment; however, a combination of them can effectively provide more powerful and useful use-cases. In this paper, we propose the methods of combining Solid Pods and distributed ledgers in introducing complete decentralisation of data with total user-control, keeping the integrity of the stored information intact through Blockchain-based verification. We demonstrated multiple configurations of our solutions, offering several new use-cases in various sectors. These configurations introduce new dimensions on the Web and mobile applications’ data storage that developers can benefit from building Distributed Applications (DApps) in a complete decentralised environment
RecipeGPT: Generative pre-training based cooking recipe generation and evaluation system
National Research Foundation (NRF) Singapore under International Research Centres in Singapore Funding Initiativ
Unveiling Coordinated Groups Behind White Helmets Disinformation
Propaganda, disinformation, manipulation, and polarization are the modern
illnesses of a society increasingly dependent on social media as a source of
news. In this paper, we explore the disinformation campaign, sponsored by
Russia and allies, against the Syria Civil Defense (a.k.a. the White Helmets).
We unveil coordinated groups using automatic retweets and content duplication
to promote narratives and/or accounts. The results also reveal distinct
promoting strategies, ranging from the small groups sharing the exact same text
repeatedly, to complex "news website factories" where dozens of accounts
synchronously spread the same news from multiple sites.Comment: To be presented at WWW 2020 Workshop on Computational Methods in
Online Misbehavior and forthcoming in the Companion Proceedings of the Web
Conference 202
Measuring Spatial Subdivisions in Urban Mobility with Mobile Phone Data
Urban population grows constantly. By 2050 two thirds of the world population
will reside in urban areas. This growth is faster and more complex than the
ability of cities to measure and plan for their sustainability. To understand
what makes a city inclusive for all, we define a methodology to identify and
characterize spatial subdivisions: areas with over- and under-representation of
specific population groups, named hot and cold spots respectively. Using
aggregated mobile phone data, we apply this methodology to the city of
Barcelona to assess the mobility of three groups of people: women, elders, and
tourists. We find that, within the three groups, cold spots have a lower
diversity of amenities and services than hot spots. Also, cold spots of women
and tourists tend to have lower population income. These insights apply to the
floating population of Barcelona, thus augmenting the scope of how
inclusiveness can be analyzed in the city.Comment: 10 pages, 10 figures. To be presented at the Data Science for Social
Good workshop at The Web Conference 202
- …