90 research outputs found

    Enabling automatic provenance-based trust assessment of web content

    Get PDF

    A model of provenance applied to biodiversity datasets

    Get PDF
    Nowadays, the Web has become one of the main sources of biodiversity information. An increasing number of biodiversity research institutions add new specimens and their related information to their biological collections and make this information available on the Web. However, mechanisms which are currently available provide insufficient provenance of biodiversity information. In this paper, we propose a new biodiversity provenance model extending the W3C PROV Data Model. Biodiversity data is mapped to terms from relevant ontologies, such as Dublin Core and GeoSPARQL, stored in triple stores and queried using SPARQL endpoints. Additionally, we provide a use case using our provenance model to enrich collection data

    Web-scale provenance reconstruction of implicit information diffusion on social media

    Get PDF
    Fast, massive, and viral data diffused on social media affects a large share of the online population, and thus, the (prospective) information diffusion mechanisms behind it are of great interest to researchers. The (retrospective) provenance of such data is equally important because it contributes to the understanding of the relevance and trustworthiness of the information. Furthermore, computing provenance in a timely way is crucial for particular use cases and practitioners, such as online journalists that promptly need to assess specific pieces of information. Social media currently provide insufficient mechanisms for provenance tracking, publication and generation, while state-of-the-art on social media research focuses mainly on explicit diffusion mechanisms (like retweets in Twitter or reshares in Facebook).The implicit diffusion mechanisms remain understudied due to the difficulties of being captured and properly understood. From a technical side, the state of the art for provenance reconstruction evaluates small datasets after the fact, sidestepping requirements for scale and speed of current social media data. In this paper, we investigate the mechanisms of implicit information diffusion by computing its fine-grained provenance. We prove that explicit mechanisms are insufficient to capture influence and our analysis unravels a significant part of implicit interactions and influence in social media. Our approach works incrementally and can be scaled up to cover a truly Web-scale scenario like major events. We can process datasets consisting of up to several millions of messages on a single machine at rates that cover bursty behaviour, without compromising result quality. By doing that, we provide to online journalists and social media users in general, fine grained provenance reconstruction which sheds lights on implicit interactions not captured by social media providers. These results are provided in an online fashion which also allows for fast relevance and trustworthiness assessment

    Graph Data-Models and Semantic Web Technologies in Scholarly Digital Editing

    Get PDF
    This volume is based on the selected papers presented at the Workshop on Scholarly Digital Editions, Graph Data-Models and Semantic Web Technologies, held at the Uni- versity of Lausanne in June 2019. The Workshop was organized by Elena Spadini (University of Lausanne) and Francesca Tomasi (University of Bologna), and spon- sored by the Swiss National Science Foundation through a Scientific Exchange grant, and by the Centre de recherche sur les lettres romandes of the University of Lausanne. The Workshop comprised two full days of vibrant discussions among the invited speakers, the authors of the selected papers, and other participants.1 The acceptance rate following the open call for papers was around 60%. All authors – both selected and invited speakers – were asked to provide a short paper two months before the Workshop. The authors were then paired up, and each pair exchanged papers. Paired authors prepared questions for one another, which were to be addressed during the talks at the Workshop; in this way, conversations started well before the Workshop itself. After the Workshop, the papers underwent a second round of peer-review before inclusion in this volume. This time, the relevance of the papers was not under discus- sion, but reviewers were asked to appraise specific aspects of each contribution, such as its originality or level of innovation, its methodological accuracy and knowledge of the literature, as well as more formal parameters such as completeness, clarity, and coherence. The bibliography of all of the papers is collected in the public Zotero group library GraphSDE20192, which has been used to generate the reference list for each contribution in this volume. The invited speakers came from a wide range of backgrounds (academic, commer- cial, and research institutions) and represented the different actors involved in the remediation of our cultural heritage in the form of graphs and/or in a semantic web en- vironment. Georg Vogeler (University of Graz) and Ronald Haentjens Dekker (Royal Dutch Academy of Sciences, Humanities Cluster) brought the Digital Humanities research perspective; the work of Hans Cools and Roberta Laura Padlina (University of Basel, National Infrastructure for Editions), as well as of Tobias Schweizer and Sepi- deh Alassi (University of Basel, Digital Humanities Lab), focused on infrastructural challenges and the development of conceptual and software frameworks to support re- searchers’ needs; Michele Pasin’s contribution (Digital Science, Springer Nature) was informed by his experiences in both academic research, and in commercial technology companies that provide services for the scientific community. The Workshop featured not only the papers of the selected authors and of the invited speakers, but also moments of discussion between interested participants. In addition to the common Q&A time, during the second day one entire session was allocated to working groups delving into topics that had emerged during the Workshop. Four working groups were created, with four to seven participants each, and each group presented a short report at the end of the session. Four themes were discussed: enhancing TEI from documents to data; ontologies for the Humanities; tools and infrastructures; and textual criticism. All of these themes are represented in this volume. The Workshop would not have been of such high quality without the support of the members of its scientific committee: Gioele Barabucci, Fabio Ciotti, Claire Clivaz, Marion Rivoal, Greta Franzini, Simon Gabay, Daniel Maggetti, Frederike Neuber, Elena Pierazzo, Davide Picca, Michael Piotrowski, Matteo Romanello, Maïeul Rouquette, Elena Spadini, Francesca Tomasi, Aris Xanthos – and, of course, the support of all the colleagues and administrative staff in Lausanne, who helped the Workshop to become a reality. The final versions of these papers underwent a single-blind peer review process. We want to thank the reviewers: Helena Bermudez Sabel, Arianna Ciula, Marilena Daquino, Richard Hadden, Daniel Jeller, Tiziana Mancinelli, Davide Picca, Michael Piotrowski, Patrick Sahle, Raffaele Viglianti, Joris van Zundert, and others who preferred not to be named personally. Your input enhanced the quality of the volume significantly! It is sad news that Hans Cools passed away during the production of the volume. We are proud to document a recent state of his work and will miss him and his ability to implement the vision of a digital scholarly edition based on graph data-models and semantic web technologies. The production of the volume would not have been possible without the thorough copy-editing and proof reading by Lucy Emmerson and the support of the IDE team, in particular Bernhard Assmann, the TeX-master himself. This volume is sponsored by the University of Bologna and by the University of Lausanne. Bologna, Lausanne, Graz, July 2021 Francesca Tomasi, Elena Spadini, Georg Vogele

    Tracking social provenance in chains of retweets

    Get PDF
    In the era of massive sharing of information, the term social provenance is used to denote the ownership, source or origin of a piece of information which has been propagated through social media. Tracking the provenance of information is becoming increasingly important as social platforms acquire more relevance as source of news. In this scenario, Twitter is considered one of the most important social networks for information sharing and dissemination which can be accelerated through the use of retweets and quotes. However, the Twitter API does not provide a complete tracking of the retweet chains, since only the connection between a retweet and the original post is stored, while all the intermediate connections are lost. This can limit the ability to track the diffusion of information as well as the estimation of the importance of specific users, who can rapidly become influencers, in the news dissemination. This paper proposes an innovative approach for rebuilding the possible chains of retweets and also providing an estimation of the contributions given by each user in the information spread. For this purpose, we define the concept of Provenance Constraint Network and a modified version of the Path Consistency Algorithm. An application of the proposed technique to a real-world dataset is presented at the end of the paper

    Propelling the Potential of Enterprise Linked Data in Austria. Roadmap and Report

    Get PDF
    In times of digital transformation and considering the potential of the data-driven economy, it is crucial that data is not only made available, data sources can be trusted, but also data integrity can be guaranteed, necessary privacy and security mechanisms are in place, and data and access comply with policies and legislation. In many cases, complex and interdisciplinary questions cannot be answered by a single dataset and thus it is necessary to combine data from multiple disparate sources. However, because most data today is locked up in isolated silos, data cannot be used to its fullest potential. The core challenge for most organisations and enterprises in regards to data exchange and integration is to be able to combine data from internal and external data sources in a manner that supports both day to day operations and innovation. Linked Data is a promising data publishing and integration paradigm that builds upon standard web technologies. It supports the publishing of structured data in a semantically explicit and interlinked manner such that it can be easily connected, and consequently becomes more interoperable and useful. The PROPEL project - Propelling the Potential of Enterprise Linked Data in Austria - surveyed technological challenges, entrepreneurial opportunities, and open research questions on the use of Linked Data in a business context and developed a roadmap and a set of recommendations for policy makers, industry, and the research community. Shifting away from a predominantly academic perspective and an exclusive focus on open data, the project looked at Linked Data as an emerging disruptive technology that enables efficient enterprise data management in the rising data economy. Current market forces provide many opportunities, but also present several data and information management challenges. Given that Linked Data enables advanced analytics and decision-making, it is particularly suitable for addressing today's data and information management challenges. In our research, we identified a variety of highly promising use cases for Linked Data in an enterprise context. Examples of promising application domains include "customization and customer relationship management", "automatic and dynamic content production, adaption and display", "data search, information retrieval and knowledge discovery", as well as "data and information exchange and integration". The analysis also revealed broad potential across a large spectrum of industries whose structural and technological characteristics align well with Linked Data characteristics and principles: energy, retail, finance and insurance, government, health, transport and logistics, telecommunications, media, tourism, engineering, and research and development rank among the most promising industries for the adoption of Linked Data principles. In addition to approaching the subject from an industry perspective, we also examined the topics and trends emerging from the research community in the field of Linked Data and the Semantic Web. Although our analysis revolved around a vibrant and active community composed of academia and leading companies involved in semantic technologies, we found that industry needs and research discussions are somewhat misaligned. Whereas some foundation technologies such as knowledge representation and data creation/publishing/sharing, data management and system engineering are highly represented in scientific papers, specific topics such as recommendations, or cross-topics such as machine learning or privacy and security are marginally present. Topics such as big/large data and the internet of things are (still) on an upward trajectory in terms of attention. In contrast, topics that are very relevant for industry such as application oriented topics or those that relate to security, privacy and robustness are not attracting much attention. When it comes to standardisation efforts, we identified a clear need for a more in-depth analysis into the effectiveness of existing standards, the degree of coverage they provide with respect the foundations they belong to, and the suitability of alternative standards that do not fall under the core Semantic Web umbrella. Taking into consideration market forces, sector analysis of Linked Data potential, demand side analysis and the current technological status it is clear that Linked Data has a lot of potential for enterprises and can act as a key driver of technological, organizational, and economic change. However, in order to ensure a solid foundation for Enterprise Linked Data include there is a need for: greater awareness surrounding the potential of Linked Data in enterprises, lowering of entrance barriers via education and training, better alignment between industry demands and research activities, greater support for technology transfer from universities to companies. The PROPEL roadmap recommends concrete measures in order to propel the adoption of Linked Data in Austrian enterprises. These measures are structured around five fields of activities: "awareness and education", "technological innovation, research gaps, standardisation", "policy and legal", and "funding". Key short-term recommendations include the clustering of existing activities in order to raise visibility on an international level, the funding of key topics that are under represented by the community, and the setup of joint projects. In the medium term, we recommend the strengthening of existing academic and private education efforts via certification and to establish flagship projects that are based on national use cases that can serve as blueprints for transnational initiatives. This requires not only financial support, but also infrastructure support, such as data and services to build solutions on top. In the long term, we recommend cooperation with international funding schemes to establish and foster a European level agenda, and the setup of centres of excellence

    A Survey of the First 20 Years of Research on Semantic Web and Linked Data

    Get PDF
    International audienceThis paper is a survey of the research topics in the field of Semantic Web, Linked Data and Web of Data. This study looks at the contributions of this research community over its first twenty years of existence. Compiling several bibliographical sources and bibliometric indicators , we identify the main research trends and we reference some of their major publications to provide an overview of that initial period. We conclude with some perspectives for the future research challenges.Cet article est une étude des sujets de recherche dans le domaine du Web sémantique, des données liées et du Web des données. Cette étude se penche sur les contributions de cette communauté de recherche au cours de ses vingt premières années d'existence. En compilant plusieurs sources bibliographiques et indicateurs bibliométriques, nous identifions les principales tendances de la recherche et nous référençons certaines de leurs publications majeures pour donner un aperçu de cette période initiale. Nous concluons avec une discussion sur les tendances et perspectives de recherche

    18th SC@RUG 2020 proceedings 2020-2021

    Get PDF
    • …
    corecore