3,206 research outputs found

    A Survey of Volunteered Open Geo-Knowledge Bases in the Semantic Web

    Full text link
    Over the past decade, rapid advances in web technologies, coupled with innovative models of spatial data collection and consumption, have generated a robust growth in geo-referenced information, resulting in spatial information overload. Increasing 'geographic intelligence' in traditional text-based information retrieval has become a prominent approach to respond to this issue and to fulfill users' spatial information needs. Numerous efforts in the Semantic Geospatial Web, Volunteered Geographic Information (VGI), and the Linking Open Data initiative have converged in a constellation of open knowledge bases, freely available online. In this article, we survey these open knowledge bases, focusing on their geospatial dimension. Particular attention is devoted to the crucial issue of the quality of geo-knowledge bases, as well as of crowdsourced data. A new knowledge base, the OpenStreetMap Semantic Network, is outlined as our contribution to this area. Research directions in information integration and Geographic Information Retrieval (GIR) are then reviewed, with a critical discussion of their current limitations and future prospects

    Same but Different: Distant Supervision for Predicting and Understanding Entity Linking Difficulty

    Full text link
    Entity Linking (EL) is the task of automatically identifying entity mentions in a piece of text and resolving them to a corresponding entity in a reference knowledge base like Wikipedia. There is a large number of EL tools available for different types of documents and domains, yet EL remains a challenging task where the lack of precision on particularly ambiguous mentions often spoils the usefulness of automated disambiguation results in real applications. A priori approximations of the difficulty to link a particular entity mention can facilitate flagging of critical cases as part of semi-automated EL systems, while detecting latent factors that affect the EL performance, like corpus-specific features, can provide insights on how to improve a system based on the special characteristics of the underlying corpus. In this paper, we first introduce a consensus-based method to generate difficulty labels for entity mentions on arbitrary corpora. The difficulty labels are then exploited as training data for a supervised classification task able to predict the EL difficulty of entity mentions using a variety of features. Experiments over a corpus of news articles show that EL difficulty can be estimated with high accuracy, revealing also latent features that affect EL performance. Finally, evaluation results demonstrate the effectiveness of the proposed method to inform semi-automated EL pipelines.Comment: Preprint of paper accepted for publication in the 34th ACM/SIGAPP Symposium On Applied Computing (SAC 2019

    A survey of data mining techniques for social media analysis

    Get PDF
    Social network has gained remarkable attention in the last decade. Accessing social network sites such as Twitter, Facebook LinkedIn and Google+ through the internet and the web 2.0 technologies has become more affordable. People are becoming more interested in and relying on social network for information, news and opinion of other users on diverse subject matters. The heavy reliance on social network sites causes them to generate massive data characterised by three computational issues namely; size, noise and dynamism. These issues often make social network data very complex to analyse manually, resulting in the pertinent use of computational means of analysing them. Data mining provides a wide range of techniques for detecting useful knowledge from massive datasets like trends, patterns and rules [44]. Data mining techniques are used for information retrieval, statistical modelling and machine learning. These techniques employ data pre-processing, data analysis, and data interpretation processes in the course of data analysis. This survey discusses different data mining techniques used in mining diverse aspects of the social network over decades going from the historical techniques to the up-to-date models, including our novel technique named TRCM. All the techniques covered in this survey are listed in the Table.1 including the tools employed as well as names of their authors

    Social Media Roadmaps. Exploring the futures triggered by social media.

    Get PDF
    Social media refers to a combination of three elements: content, user communities and Web 2.0 technologies. This foresight report presents six roadmaps of the anticipated developments of social media in three themes: society, companies, and local environment. One of the roadmaps, the meta-roadmap, is the synthesis of them all. The society sub-roadmap explores societal participation through communities. There are three sub-roadmaps relating to companies: interacting with companies through communities, social media in work environment, and social media enhanced shopping. The local environment sub-roadmap looks at social media in local environment. The roadmapping process was carried out through two workshops at VTT. The results of the report are crystallized into five main development lines triggered by social media. First development line is transparency referring to its increasing role in society, both with positive and negative consequences. The second development line is the rise of ubiquitous participatory communication model. This refers to an increase of two-directional and community-based interactivity in every field, where it has some added value. The third development is reflexive empowerment. This refers to the role of social media as an enabler of grass-root community collaboration. The fourth development line is the duality personalization/fragmentation vs. mass effects/integration. Personalization /fragmentation emphasises the tailoring of the web services and content. This development is counterweighted by mass effects/integration, like the formation of super-nodes in the web. The fifth development line is the new relations of physical and virtual worlds. This development line highlights the idea that practices induced by social media, e.g. communication, participation, co-creation, feedback and rating, will get more common in daily environment, and that virtual and physical worlds will be more and more interlinked.</p
    • …
    corecore