1,892 research outputs found

    Boilerplate Removal using a Neural Sequence Labeling Model

    Full text link
    The extraction of main content from web pages is an important task for numerous applications, ranging from usability aspects, like reader views for news articles in web browsers, to information retrieval or natural language processing. Existing approaches are lacking as they rely on large amounts of hand-crafted features for classification. This results in models that are tailored to a specific distribution of web pages, e.g. from a certain time frame, but lack in generalization power. We propose a neural sequence labeling model that does not rely on any hand-crafted features but takes only the HTML tags and words that appear in a web page as input. This allows us to present a browser extension which highlights the content of arbitrary web pages directly within the browser using our model. In addition, we create a new, more current dataset to show that our model is able to adapt to changes in the structure of web pages and outperform the state-of-the-art model.Comment: WWW20 Demo pape

    SMART-KG: Hybrid Shipping for SPARQL Querying on the Web

    Get PDF
    While Linked Data (LD) provides standards for publishing (RDF) and (SPARQL) querying Knowledge Graphs (KGs) on the Web, serving, accessing and processing such open, decentralized KGs is often practically impossible, as query timeouts on publicly available SPARQL endpoints show. Alternative solutions such as Triple Pattern Fragments (TPF) attempt to tackle the problem of availability by pushing query processing workload to the client side, but suffer from unnecessary transfer of irrelevant data on complex queries with large intermediate results. In this paper we present smart-KG, a novel approach to share the load between servers and clients, while significantly reducing data transfer volume, by combining TPF with shipping compressed KG partitions. Our evaluations show that outperforms state-of-the-art client-side solutions and increases server-side availability towards more cost-effective and balanced hosting of open and decentralized KGs.Series: Working Papers on Information Systems, Information Business and Operation

    Layered Graph Embedding for Entity Recommendation using Wikipedia in the Yahoo! Knowledge Graph

    Full text link
    In this paper, we describe an embedding-based entity recommendation framework for Wikipedia that organizes Wikipedia into a collection of graphs layered on top of each other, learns complementary entity representations from their topology and content, and combines them with a lightweight learning-to-rank approach to recommend related entities on Wikipedia. Through offline and online evaluations, we show that the resulting embeddings and recommendations perform well in terms of quality and user engagement. Balancing simplicity and quality, this framework provides default entity recommendations for English and other languages in the Yahoo! Knowledge Graph, which Wikipedia is a core subset of.Comment: 8 pages, 4 figures, 8 tables. To be appeared in Wiki Workshop 2020, Companion Proceedings of the Web Conference 2020(WWW 20 Companion), Taipei, Taiwa

    RecipeGPT: Generative pre-training based cooking recipe generation and evaluation system

    Get PDF
    National Research Foundation (NRF) Singapore under International Research Centres in Singapore Funding Initiativ

    Unveiling Coordinated Groups Behind White Helmets Disinformation

    Full text link
    Propaganda, disinformation, manipulation, and polarization are the modern illnesses of a society increasingly dependent on social media as a source of news. In this paper, we explore the disinformation campaign, sponsored by Russia and allies, against the Syria Civil Defense (a.k.a. the White Helmets). We unveil coordinated groups using automatic retweets and content duplication to promote narratives and/or accounts. The results also reveal distinct promoting strategies, ranging from the small groups sharing the exact same text repeatedly, to complex "news website factories" where dozens of accounts synchronously spread the same news from multiple sites.Comment: To be presented at WWW 2020 Workshop on Computational Methods in Online Misbehavior and forthcoming in the Companion Proceedings of the Web Conference 202

    Measuring Spatial Subdivisions in Urban Mobility with Mobile Phone Data

    Get PDF
    Urban population grows constantly. By 2050 two thirds of the world population will reside in urban areas. This growth is faster and more complex than the ability of cities to measure and plan for their sustainability. To understand what makes a city inclusive for all, we define a methodology to identify and characterize spatial subdivisions: areas with over- and under-representation of specific population groups, named hot and cold spots respectively. Using aggregated mobile phone data, we apply this methodology to the city of Barcelona to assess the mobility of three groups of people: women, elders, and tourists. We find that, within the three groups, cold spots have a lower diversity of amenities and services than hot spots. Also, cold spots of women and tourists tend to have lower population income. These insights apply to the floating population of Barcelona, thus augmenting the scope of how inclusiveness can be analyzed in the city.Comment: 10 pages, 10 figures. To be presented at the Data Science for Social Good workshop at The Web Conference 202
    • …
    corecore