Search CORE

1,892 research outputs found

Boilerplate Removal using a Neural Sequence Labeling Model

Author: Anand Avishek
Khosla Megha
Leonhardt Jurek
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 22/04/2020
Field of study

The extraction of main content from web pages is an important task for numerous applications, ranging from usability aspects, like reader views for news articles in web browsers, to information retrieval or natural language processing. Existing approaches are lacking as they rely on large amounts of hand-crafted features for classification. This results in models that are tailored to a specific distribution of web pages, e.g. from a certain time frame, but lack in generalization power. We propose a neural sequence labeling model that does not rely on any hand-crafted features but takes only the HTML tags and words that appear in a web page as input. This allows us to present a browser extension which highlights the content of arbitrary web pages directly within the browser using our model. In addition, we create a new, more current dataset to show that our model is able to adapt to changes in the structure of web pages and outperform the state-of-the-art model.Comment: WWW20 Demo pape

arXiv.org e-Print Archive

Crossref

SMART-KG: Hybrid Shipping for SPARQL Querying on the Web

Author: Amr Azzam
Fernandez Garcia Javier David
Maribel Acosta
Polleres Axel
Publication venue: Department für Informationsverarbeitung und Prozessmanagement
Publication date: 16/01/2020
Field of study

While Linked Data (LD) provides standards for publishing (RDF) and (SPARQL) querying Knowledge Graphs (KGs) on the Web, serving, accessing and processing such open, decentralized KGs is often practically impossible, as query timeouts on publicly available SPARQL endpoints show. Alternative solutions such as Triple Pattern Fragments (TPF) attempt to tackle the problem of availability by pushing query processing workload to the client side, but suffer from unnecessary transfer of irrelevant data on complex queries with large intermediate results. In this paper we present smart-KG, a novel approach to share the load between servers and clients, while significantly reducing data transfer volume, by combining TPF with shipping compressed KG partitions. Our evaluations show that outperforms state-of-the-art client-side solutions and increases server-side availability towards more cost-effective and balanced hosting of open and decentralized KGs.Series: Working Papers on Information Systems, Information Business and Operation

Elektronische Publikationen der Wirtschaftsuniversität Wien

Unbiased Learning to Rank: Counterfactual and Online Approaches

Author: de Rijke M.
Jagerman R.
Oosterhuis H.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

International Migration, Integration and Social Cohesion online publications

Layered Graph Embedding for Entity Recommendation using Wikipedia in the Yahoo! Knowledge Graph

Author: Aggarwal Nitish
Dai M.
Henry W
Le Quoc
Liu W
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 14/04/2020
Field of study

In this paper, we describe an embedding-based entity recommendation framework for Wikipedia that organizes Wikipedia into a collection of graphs layered on top of each other, learns complementary entity representations from their topology and content, and combines them with a lightweight learning-to-rank approach to recommend related entities on Wikipedia. Through offline and online evaluations, we show that the resulting embeddings and recommendations perform well in terms of quality and user engagement. Balancing simplicity and quality, this framework provides default entity recommendations for English and other languages in the Yahoo! Knowledge Graph, which Wikipedia is a core subset of.Comment: 8 pages, 4 figures, 8 tables. To be appeared in Wiki Workshop 2020, Companion Proceedings of the Web Conference 2020(WWW 20 Companion), Taipei, Taiwa

arXiv.org e-Print Archive

Crossref

Recommended from our members

Towards Complete Decentralised Verification of Data with Confidentiality: Different ways to connect Solid Pods and Blockchain

Author: Bachler Michelle
Chowdhury Niaz
Domingue John
Quick Kevin
Ramachandran Manoharan
Third Allan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2020
Field of study

Over-centralisation of data leads to tampering and sharing user information without the consent of the owners. This problem has been studied extensively in recent times providing separate solutions involving distributed storage, Blockchain technology and Solid Pods. Individually these solutions are not sufficient to build realistic applications in a decentralised environment; however, a combination of them can effectively provide more powerful and useful use-cases. In this paper, we propose the methods of combining Solid Pods and distributed ledgers in introducing complete decentralisation of data with total user-control, keeping the integrity of the stored information intact through Blockchain-based verification. We demonstrated multiple configurations of our solutions, offering several new use-cases in various sectors. These configurations introduce new dimensions on the Web and mobile applications’ data storage that developers can benefit from building Distributed Applications (DApps) in a complete decentralised environment

Open Research Online (The Open University)

RecipeGPT: Generative pre-training based cooking recipe generation and evaluation system

Author: ACHANANUPARP Palakorn
LEE Helena Huey Chong
LIM Ee-peng
LIU Yue
PRASETYO Philips Kokoh
SHU Ke
VARSHNEY Lav R.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/03/2020
Field of study

National Research Foundation (NRF) Singapore under International Research Centres in Singapore Funding Initiativ

arXiv.org e-Print Archive

Crossref

Institutional Knowledge at Singapore Management University

Capturing Evolution in Word Usage: Just Add More Clusters?

Author: Aitchison Jean
Alagić Domagoj
Devlin Jacob
Goldberg Yoav
Kutuzov Andrey
Publication venue: ACM
Publication date: 01/01/2020
Field of study

Peer reviewe

arXiv.org e-Print Archive

Crossref

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Helsingin yliopiston digitaalinen arkisto

Unveiling Coordinated Groups Behind White Helmets Disinformation

Author: Deb Ashok
Ferrara Emilio
Keller B.
Levinger Matthew
Prier Jarred
Ratcliff W
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/03/2020
Field of study

Propaganda, disinformation, manipulation, and polarization are the modern illnesses of a society increasingly dependent on social media as a source of news. In this paper, we explore the disinformation campaign, sponsored by Russia and allies, against the Syria Civil Defense (a.k.a. the White Helmets). We unveil coordinated groups using automatic retweets and content duplication to promote narratives and/or accounts. The results also reveal distinct promoting strategies, ranging from the small groups sharing the exact same text repeatedly, to complex "news website factories" where dozens of accounts synchronously spread the same news from multiple sites.Comment: To be presented at WWW 2020 Workshop on Computational Methods in Online Misbehavior and forthcoming in the Companion Proceedings of the Web Conference 202

arXiv.org e-Print Archive

Crossref

Measuring Spatial Subdivisions in Urban Mobility with Mobile Phone Data

Author: Cucchietti Fernando M.
Graells-Garrido Eduardo
Meta Irene
Reyes Patricio
Serra-Burriel Feliu
Publication venue
Publication date: 01/01/2020
Field of study

Urban population grows constantly. By 2050 two thirds of the world population will reside in urban areas. This growth is faster and more complex than the ability of cities to measure and plan for their sustainability. To understand what makes a city inclusive for all, we define a methodology to identify and characterize spatial subdivisions: areas with over- and under-representation of specific population groups, named hot and cold spots respectively. Using aggregated mobile phone data, we apply this methodology to the city of Barcelona to assess the mobility of three groups of people: women, elders, and tourists. We find that, within the three groups, cold spots have a lower diversity of amenities and services than hot spots. Also, cold spots of women and tourists tend to have lower population income. These insights apply to the floating population of Barcelona, thus augmenting the scope of how inclusiveness can be analyzed in the city.Comment: 10 pages, 10 figures. To be presented at the Data Science for Social Good workshop at The Web Conference 202

arXiv.org e-Print Archive

Crossref

UPCommons. Portal del coneixement obert de la UPC

Distributed continuous home care provisioning through personalized monitoring & treatment planning

Author: Arndt Dörthe
Bonte Pieter
De Brouwer Mathias
De Turck Filip
Dimou Anastasia
Heyvaert Pieter
Ongenae Femke
Vander Sande Miel
Verborgh Ruben
Publication venue: 'American College of Medical Physics (ACMP)'
Publication date: 01/01/2020
Field of study

Ghent University Academic Bibliography