545 research outputs found

    Security Aspects in Web of Data Based on Trust Principles. A brief of Literature Review

    Get PDF
    Within scientific community, there is a certain consensus to define "Big Data" as a global set, through a complex integration that embraces several dimensions from using of research data, Open Data, Linked Data, Social Network Data, etc. These data are scattered in different sources, which suppose a mix that respond to diverse philosophies, great diversity of structures, different denominations, etc. Its management faces great technological and methodological challenges: The discovery and selection of data, its extraction and final processing, preservation, visualization, access possibility, greater or lesser structuring, between other aspects, which allow showing a huge domain of study at the level of analysis and implementation in different knowledge domains. However, given the data availability and its possible opening: What problems do the data opening face? This paper shows a literature review about these security aspects

    Hybrid human-AI driven open personalized education

    Get PDF
    Attaining those skills that match labor market demand is getting increasingly complicated as prerequisite knowledge, skills, and abilities are evolving dynamically through an uncontrollable and seemingly unpredictable process. Furthermore, people's interests in gaining knowledge pertaining to their personal life (e.g., hobbies and life-hacks) are also increasing dramatically in recent decades. In this situation, anticipating and addressing the learning needs are fundamental challenges to twenty-first century education. The need for such technologies has escalated due to the COVID-19 pandemic, where online education became a key player in all types of training programs. The burgeoning availability of data, not only on the demand side but also on the supply side (in the form of open/free educational resources) coupled with smart technologies, may provide a fertile ground for addressing this challenge. Therefore, this thesis aims to contribute to the literature about the utilization of (open and free-online) educational resources toward goal-driven personalized informal learning, by developing a novel Human-AI based system, called eDoer. In this thesis, we discuss all the new knowledge that was created in order to complete the system development, which includes 1) prototype development and qualitative user validation, 2) decomposing the preliminary requirements into meaningful components, 3) implementation and validation of each component, and 4) a final requirement analysis followed by combining the implemented components in order develop and validate the planned system (eDoer). All in all, our proposed system 1) derives the skill requirements for a wide range of occupations (as skills and jobs are typical goals in informal learning) through an analysis of online job vacancy announcements, 2) decomposes skills into learning topics, 3) collects a variety of open/free online educational resources that address those topics, 4) checks the quality of those resources and topic relevance using our developed intelligent prediction models, 5) helps learners to set their learning goals, 6) recommends personalized learning pathways and learning content based on individual learning goals, and 7) provides assessment services for learners to monitor their progress towards their desired learning objectives. Accordingly, we created a learning dashboard focusing on three Data Science related jobs and conducted an initial validation of eDoer through a randomized experiment. Controlling for the effects of prior knowledge as assessed by the pretest, the randomized experiment provided tentative support for the hypothesis that learners who engaged with personal eDoer recommendations attain higher scores on the posttest than those who did not. The hypothesis that learners who received personalized content in terms of format, length, level of detail, and content type, would achieve higher scores than those receiving non-personalized content was not supported as a statistically significant result

    Automatic Generation of Personalized Recommendations in eCoaching

    Get PDF
    Denne avhandlingen omhandler eCoaching for personlig livsstilsstÞtte i sanntid ved bruk av informasjons- og kommunikasjonsteknologi. Utfordringen er Ä designe, utvikle og teknisk evaluere en prototyp av en intelligent eCoach som automatisk genererer personlige og evidensbaserte anbefalinger til en bedre livsstil. Den utviklede lÞsningen er fokusert pÄ forbedring av fysisk aktivitet. Prototypen bruker bÊrbare medisinske aktivitetssensorer. De innsamlede data blir semantisk representert og kunstig intelligente algoritmer genererer automatisk meningsfulle, personlige og kontekstbaserte anbefalinger for mindre stillesittende tid. Oppgaven bruker den veletablerte designvitenskapelige forskningsmetodikken for Ä utvikle teoretiske grunnlag og praktiske implementeringer. Samlet sett fokuserer denne forskningen pÄ teknologisk verifisering snarere enn klinisk evaluering.publishedVersio

    Towards a human-centric data economy

    Get PDF
    Spurred by widespread adoption of artificial intelligence and machine learning, “data” is becoming a key production factor, comparable in importance to capital, land, or labour in an increasingly digital economy. In spite of an ever-growing demand for third-party data in the B2B market, firms are generally reluctant to share their information. This is due to the unique characteristics of “data” as an economic good (a freely replicable, non-depletable asset holding a highly combinatorial and context-specific value), which moves digital companies to hoard and protect their “valuable” data assets, and to integrate across the whole value chain seeking to monopolise the provision of innovative services built upon them. As a result, most of those valuable assets still remain unexploited in corporate silos nowadays. This situation is shaping the so-called data economy around a number of champions, and it is hampering the benefits of a global data exchange on a large scale. Some analysts have estimated the potential value of the data economy in US$2.5 trillion globally by 2025. Not surprisingly, unlocking the value of data has become a central policy of the European Union, which also estimated the size of the data economy in 827C billion for the EU27 in the same period. Within the scope of the European Data Strategy, the European Commission is also steering relevant initiatives aimed to identify relevant cross-industry use cases involving different verticals, and to enable sovereign data exchanges to realise them. Among individuals, the massive collection and exploitation of personal data by digital firms in exchange of services, often with little or no consent, has raised a general concern about privacy and data protection. Apart from spurring recent legislative developments in this direction, this concern has raised some voices warning against the unsustainability of the existing digital economics (few digital champions, potential negative impact on employment, growing inequality), some of which propose that people are paid for their data in a sort of worldwide data labour market as a potential solution to this dilemma [114, 115, 155]. From a technical perspective, we are far from having the required technology and algorithms that will enable such a human-centric data economy. Even its scope is still blurry, and the question about the value of data, at least, controversial. Research works from different disciplines have studied the data value chain, different approaches to the value of data, how to price data assets, and novel data marketplace designs. At the same time, complex legal and ethical issues with respect to the data economy have risen around privacy, data protection, and ethical AI practices. In this dissertation, we start by exploring the data value chain and how entities trade data assets over the Internet. We carry out what is, to the best of our understanding, the most thorough survey of commercial data marketplaces. In this work, we have catalogued and characterised ten different business models, including those of personal information management systems, companies born in the wake of recent data protection regulations and aiming at empowering end users to take control of their data. We have also identified the challenges faced by different types of entities, and what kind of solutions and technology they are using to provide their services. Then we present a first of its kind measurement study that sheds light on the prices of data in the market using a novel methodology. We study how ten commercial data marketplaces categorise and classify data assets, and which categories of data command higher prices. We also develop classifiers for comparing data products across different marketplaces, and we study the characteristics of the most valuable data assets and the features that specific vendors use to set the price of their data products. Based on this information and adding data products offered by other 33 data providers, we develop a regression analysis for revealing features that correlate with prices of data products. As a result, we also implement the basic building blocks of a novel data pricing tool capable of providing a hint of the market price of a new data product using as inputs just its metadata. This tool would provide more transparency on the prices of data products in the market, which will help in pricing data assets and in avoiding the inherent price fluctuation of nascent markets. Next we turn to topics related to data marketplace design. Particularly, we study how buyers can select and purchase suitable data for their tasks without requiring a priori access to such data in order to make a purchase decision, and how marketplaces can distribute payoffs for a data transaction combining data of different sources among the corresponding providers, be they individuals or firms. The difficulty of both problems is further exacerbated in a human-centric data economy where buyers have to choose among data of thousands of individuals, and where marketplaces have to distribute payoffs to thousands of people contributing personal data to a specific transaction. Regarding the selection process, we compare different purchase strategies depending on the level of information available to data buyers at the time of making decisions. A first methodological contribution of our work is proposing a data evaluation stage prior to datasets being selected and purchased by buyers in a marketplace. We show that buyers can significantly improve the performance of the purchasing process just by being provided with a measurement of the performance of their models when trained by the marketplace with individual eligible datasets. We design purchase strategies that exploit such functionality and we call the resulting algorithm Try Before You Buy, and our work demonstrates over synthetic and real datasets that it can lead to near-optimal data purchasing with only O(N) instead of the exponential execution time - O(2N) - needed to calculate the optimal purchase. With regards to the payoff distribution problem, we focus on computing the relative value of spatio-temporal datasets combined in marketplaces for predicting transportation demand and travel time in metropolitan areas. Using large datasets of taxi rides from Chicago, Porto and New York we show that the value of data is different for each individual, and cannot be approximated by its volume. Our results reveal that even more complex approaches based on the “leave-one-out” value, are inaccurate. Instead, more complex and acknowledged notions of value from economics and game theory, such as the Shapley value, need to be employed if one wishes to capture the complex effects of mixing different datasets on the accuracy of forecasting algorithms. However, the Shapley value entails serious computational challenges. Its exact calculation requires repetitively training and evaluating every combination of data sources and hence O(N!) or O(2N) computational time, which is unfeasible for complex models or thousands of individuals. Moreover, our work paves the way to new methods of measuring the value of spatio-temporal data. We identify heuristics such as entropy or similarity to the average that show a significant correlation with the Shapley value and therefore can be used to overcome the significant computational challenges posed by Shapley approximation algorithms in this specific context. We conclude with a number of open issues and propose further research directions that leverage the contributions and findings of this dissertation. These include monitoring data transactions to better measure data markets, and complementing market data with actual transaction prices to build a more accurate data pricing tool. A human-centric data economy would also require that the contributions of thousands of individuals to machine learning tasks are calculated daily. For that to be feasible, we need to further optimise the efficiency of data purchasing and payoff calculation processes in data marketplaces. In that direction, we also point to some alternatives to repetitively training and evaluating a model to select data based on Try Before You Buy and approximate the Shapley value. Finally, we discuss the challenges and potential technologies that help with building a federation of standardised data marketplaces. The data economy will develop fast in the upcoming years, and researchers from different disciplines will work together to unlock the value of data and make the most out of it. Maybe the proposal of getting paid for our data and our contribution to the data economy finally flies, or maybe it is other proposals such as the robot tax that are finally used to balance the power between individuals and tech firms in the digital economy. Still, we hope our work sheds light on the value of data, and contributes to making the price of data more transparent and, eventually, to moving towards a human-centric data economy.This work has been supported by IMDEA Networks InstitutePrograma de Doctorado en Ingeniería Telemática por la Universidad Carlos III de MadridPresidente: Georgios Smaragdakis.- Secretario: Ángel Cuevas Rumín.- Vocal: Pablo Rodríguez Rodrígue

    Characteristic sets profile features: Estimation and application to SPARQL query planning

    Get PDF
    RDF dataset profiling is the task of extracting a formal representation of a dataset’s features. Such features may cover various aspects of the RDF dataset ranging from information on licensing and provenance to statistical descriptors of the data distribution and its semantics. In this work, we focus on the characteristics sets profile features that capture both structural and semantic information of an RDF dataset, making them a valuable resource for different downstream applications. While previous research demonstrated the benefits of characteristic sets in centralized and federated query processing, access to these fine-grained statistics is taken for granted. However, especially in federated query processing, computing this profile feature is challenging as it can be difficult and/or costly to access and process the entire data from all federation members. We address this shortcoming by introducing the concept of a profile feature estimation and propose a sampling-based approach to generate estimations for the characteristic sets profile feature. In addition, we showcase the applicability of these feature estimations in federated querying by proposing a query planning approach that is specifically designed to leverage these feature estimations. In our first experimental study, we intrinsically evaluate our approach on the representativeness of the feature estimation. The results show that even small samples of just 0.5% of the original graph’s entities allow for estimating both structural and statistical properties of the characteristic sets profile features. Our second experimental study extrinsically evaluates the estimations by investigating their applicability in our query planner using the well-known FedBench benchmark. The results of the experiments show that the estimated profile features allow for obtaining efficient query plans

    Semantic Data Management in Data Lakes

    Full text link
    In recent years, data lakes emerged as away to manage large amounts of heterogeneous data for modern data analytics. One way to prevent data lakes from turning into inoperable data swamps is semantic data management. Some approaches propose the linkage of metadata to knowledge graphs based on the Linked Data principles to provide more meaning and semantics to the data in the lake. Such a semantic layer may be utilized not only for data management but also to tackle the problem of data integration from heterogeneous sources, in order to make data access more expressive and interoperable. In this survey, we review recent approaches with a specific focus on the application within data lake systems and scalability to Big Data. We classify the approaches into (i) basic semantic data management, (ii) semantic modeling approaches for enriching metadata in data lakes, and (iii) methods for ontologybased data access. In each category, we cover the main techniques and their background, and compare latest research. Finally, we point out challenges for future work in this research area, which needs a closer integration of Big Data and Semantic Web technologies

    Machine Learning Algorithm for the Scansion of Old Saxon Poetry

    Get PDF
    Several scholars designed tools to perform the automatic scansion of poetry in many languages, but none of these tools deal with Old Saxon or Old English. This project aims to be a first attempt to create a tool for these languages. We implemented a Bidirectional Long Short-Term Memory (BiLSTM) model to perform the automatic scansion of Old Saxon and Old English poems. Since this model uses supervised learning, we manually annotated the Heliand manuscript, and we used the resulting corpus as labeled dataset to train the model. The evaluation of the performance of the algorithm reached a 97% for the accuracy and a 99% of weighted average for precision, recall and F1 Score. In addition, we tested the model with some verses from the Old Saxon Genesis and some from The Battle of Brunanburh, and we observed that the model predicted almost all Old Saxon metrical patterns correctly misclassified the majority of the Old English input verses

    Cognition-Based Evaluation of Visualisation Frameworks for Exploring Structured Cultural Heritage Data

    Get PDF
    It is often claimed that Information Visualisation (InfoVis) tools improve the audience’s engagement with the display of cultural heritage (CH) collections, open up CH content to new audiences and support teaching and learning through interactive experiences. But there is a lack of studies systematically evaluating these claims, particularly from the perspective of modern educational theory. As far as the author is aware no experimental investigation has been undertaken until now, that attempts to measure deeper levels of user engagement and learning with InfoVis tools. The investigation of this thesis complements InfoVis research by initiating a human-centric approach since little previous research has attempted to incorporate and integrate human cognition as one of the fundamental components of InfoVis. In this thesis, using Bloom’s taxonomy of learning objectives as well as individual learning characteristics (i.e. cognitive preferences), I have evaluated the visitor experience of an art collection both with and without InfoVis tools (between subjects design). Results indicate that whilst InfoVis tools have some positive effect on the lower levels of learning, they are less effective for higher levels. In addition, this thesis shows that InfoVis tools seem to be more effective when they match specific cognitive preferences. These results have implications for both the designers of tools and for CH venues in terms of expectation of effectiveness and exhibition design; the proposed cognitive based evaluation framework and the results of this investigation could provide a valuable baseline for assessing the effectiveness of visitors’ interaction with the artifacts of online and physical exhibitions where InfoVis tools such as Timelines and Maps along with storytelling techniques are being used

    Managing healthcare transformation towards P5 medicine (Published in Frontiers in Medicine)

    Get PDF
    Health and social care systems around the world are facing radical organizational, methodological and technological paradigm changes to meet the requirements for improving quality and safety of care as well as efficiency and efficacy of care processes. In this they’re trying to manage the challenges of ongoing demographic changes towards aging, multi-diseased societies, development of human resources, a health and social services consumerism, medical and biomedical progress, and exploding costs for health-related R&D as well as health services delivery. Furthermore, they intend to achieve sustainability of global health systems by transforming them towards intelligent, adaptive and proactive systems focusing on health and wellness with optimized quality and safety outcomes. The outcome is a transformed health and wellness ecosystem combining the approaches of translational medicine, 5P medicine (personalized, preventive, predictive, participative precision medicine) and digital health towards ubiquitous personalized health services realized independent of time and location. It considers individual health status, conditions, genetic and genomic dispositions in personal social, occupational, environmental and behavioural context, thus turning health and social care from reactive to proactive. This requires the advancement communication and cooperation among the business actors from different domains (disciplines) with different methodologies, terminologies/ontologies, education, skills and experiences from data level (data sharing) to concept/knowledge level (knowledge sharing). The challenge here is the understanding and the formal as well as consistent representation of the world of sciences and practices, i.e. of multidisciplinary and dynamic systems in variable context, for enabling mapping between the different disciplines, methodologies, perspectives, intentions, languages, etc. Based on a framework for dynamically, use-case-specifically and context aware representing multi-domain ecosystems including their development process, systems, models and artefacts can be consistently represented, harmonized and integrated. The response to that problem is the formal representation of health and social care ecosystems through an system-oriented, architecture-centric, ontology-based and policy-driven model and framework, addressing all domains and development process views contributing to the system and context in question. Accordingly, this Research Topic would like to address this change towards 5P medicine. Specifically, areas of interest include, but are not limited: ‱ A multidisciplinary approach to the transformation of health and social systems ‱ Success factors for sustainable P5 ecosystems ‱ AI and robotics in transformed health ecosystems ‱ Transformed health ecosystems challenges for security, privacy and trust ‱ Modelling digital health systems ‱ Ethical challenges of personalized digital health ‱ Knowledge representation and management of transformed health ecosystems Table of Contents: 04 Editorial: Managing healthcare transformation towards P5 medicine Bernd Blobel and Dipak Kalra 06 Transformation of Health and Social Care Systems—An Interdisciplinary Approach Toward a Foundational Architecture Bernd Blobel, Frank Oemig, Pekka Ruotsalainen and Diego M. Lopez 26 Transformed Health Ecosystems—Challenges for Security, Privacy, and Trust Pekka Ruotsalainen and Bernd Blobel 36 Success Factors for Scaling Up the Adoption of Digital Therapeutics Towards the Realization of P5 Medicine Alexandra Prodan, Lucas Deimel, Johannes Ahlqvist, Strahil Birov, Rainer Thiel, Meeri Toivanen, Zoi Kolitsi and Dipak Kalra 49 EU-Funded Telemedicine Projects – Assessment of, and Lessons Learned From, in the Light of the SARS-CoV-2 Pandemic Laura Paleari, Virginia Malini, Gabriella Paoli, Stefano Scillieri, Claudia Bighin, Bernd Blobel and Mauro Giacomini 60 A Review of Artificial Intelligence and Robotics in Transformed Health Ecosystems Kerstin Denecke and Claude R. Baudoin 73 Modeling digital health systems to foster interoperability Frank Oemig and Bernd Blobel 89 Challenges and solutions for transforming health ecosystems in low- and middle-income countries through artificial intelligence Diego M. López, Carolina Rico-Olarte, Bernd Blobel and Carol Hullin 111 Linguistic and ontological challenges of multiple domains contributing to transformed health ecosystems Markus Kreuzthaler, Mathias Brochhausen, Cilia Zayas, Bernd Blobel and Stefan Schulz 126 The ethical challenges of personalized digital health Els Maeckelberghe, Kinga Zdunek, Sara Marceglia, Bobbie Farsides and Michael Rigb

    Integration of heterogeneous data sources and automated reasoning in healthcare and domotic IoT systems

    Get PDF
    In recent years, IoT technology has radically transformed many crucial industrial and service sectors such as healthcare. The multi-facets heterogeneity of the devices and the collected information provides important opportunities to develop innovative systems and services. However, the ubiquitous presence of data silos and the poor semantic interoperability in the IoT landscape constitute a significant obstacle in the pursuit of this goal. Moreover, achieving actionable knowledge from the collected data requires IoT information sources to be analysed using appropriate artificial intelligence techniques such as automated reasoning. In this thesis work, Semantic Web technologies have been investigated as an approach to address both the data integration and reasoning aspect in modern IoT systems. In particular, the contributions presented in this thesis are the following: (1) the IoT Fitness Ontology, an OWL ontology that has been developed in order to overcome the issue of data silos and enable semantic interoperability in the IoT fitness domain; (2) a Linked Open Data web portal for collecting and sharing IoT health datasets with the research community; (3) a novel methodology for embedding knowledge in rule-defined IoT smart home scenarios; and (4) a knowledge-based IoT home automation system that supports a seamless integration of heterogeneous devices and data sources
    • 

    corecore