708 research outputs found
Incremental schema integration for data wrangling via knowledge graphs
Virtual data integration is the current approach to go for data wrangling in data-driven decision-making. In this paper, we focus on automating schema integration, which extracts a homogenised representation of the data source schemata and integrates them into a global schema to enable virtual data integration. Schema integration requires a set of well-known constructs: the data source schemata and wrappers, a global integrated schema and the mappings between them. Based on them, virtual data integration systems enable fast and on-demand data exploration via query rewriting. Unfortunately, the generation of such constructs is currently performed in a largely manual manner, hindering its feasibility in real scenarios. This becomes aggravated when dealing with heterogeneous and evolving data sources. To overcome these issues, we propose a fully-fledged semi-automatic and incremental approach grounded on knowledge graphs to generate the required schema integration constructs in four main steps: bootstrapping, schema matching, schema integration, and generation of system-specific constructs. We also present NextiaDI, a tool implementing our approach. Finally, a comprehensive evaluation is presented to scrutinize our approach.This work was partly supported by the DOGO4ML project, funded by the Spanish Ministerio de Ciencia e Innovación under project PID2020-117191RB-I00, and D3M project, funded by the Spanish Agencia Estatal de Investigación (AEI) under project PDC2021-121195-I00. Javier Flores is supported by contract 2020-DI-027 of the Industrial Doctorate Program of the Government of Catalonia and Consejo Nacional de Ciencia y Tecnología (CONACYT, Mexico). Sergi Nadal is partly supported by the Spanish Ministerio de Ciencia e Innovación, as well as the European Union – NextGenerationEU, under project FJC2020-045809-I.Peer ReviewedPostprint (published version
The Politics of Platformization: Amsterdam Dialogues on Platform Theory
What is platformization and why is it a relevant category in the contemporary political landscape? How is it related to cybernetics and the history of computation? This book tries to answer such questions by engaging in multidisciplinary dialogues about the first ten years of the emerging fields of platform studies and platform theory. It deploys a narrative and playful approach that makes use of anecdotes, personal histories, etymologies, and futurable speculations to investigate both the fragmented genealogy that led to platformization and the organizational and economic trends that guide nowadays platform sociotechnical imaginaries
A Survey of Graph-based Deep Learning for Anomaly Detection in Distributed Systems
Anomaly detection is a crucial task in complex distributed systems. A
thorough understanding of the requirements and challenges of anomaly detection
is pivotal to the security of such systems, especially for real-world
deployment. While there are many works and application domains that deal with
this problem, few have attempted to provide an in-depth look at such systems.
In this survey, we explore the potentials of graph-based algorithms to identify
anomalies in distributed systems. These systems can be heterogeneous or
homogeneous, which can result in distinct requirements. One of our objectives
is to provide an in-depth look at graph-based approaches to conceptually
analyze their capability to handle real-world challenges such as heterogeneity
and dynamic structure. This study gives an overview of the State-of-the-Art
(SotA) research articles in the field and compare and contrast their
characteristics. To facilitate a more comprehensive understanding, we present
three systems with varying abstractions as use cases. We examine the specific
challenges involved in anomaly detection within such systems. Subsequently, we
elucidate the efficacy of graphs in such systems and explicate their
advantages. We then delve into the SotA methods and highlight their strength
and weaknesses, pointing out the areas for possible improvements and future
works.Comment: The first two authors (A. Danesh Pazho and G. Alinezhad Noghre) have
equal contribution. The article is accepted by IEEE Transactions on Knowledge
and Data Engineerin
Data ethics : building trust : how digital technologies can serve humanity
Data is the magic word of the 21st century. As oil in the 20th century and electricity in the 19th century:
For citizens, data means support in daily life in almost all activities, from watch to laptop, from kitchen to car,
from mobile phone to politics. For business and politics, data means power, dominance, winning the race. Data can be used for good and bad,
for services and hacking, for medicine and arms race. How can we build trust in this complex and ambiguous data world?
How can digital technologies serve humanity? The 45 articles in this book represent a broad range of ethical reflections and recommendations
in eight sections: a) Values, Trust and Law, b) AI, Robots and Humans, c) Health and Neuroscience, d) Religions for Digital Justice, e) Farming, Business, Finance, f) Security, War, Peace, g) Data Governance, Geopolitics, h) Media, Education, Communication.
The authors and institutions come from all continents.
The book serves as reading material for teachers, students, policy makers, politicians, business, hospitals, NGOs and religious organisations alike. It is an invitation for dialogue, debate and building trust!
The book is a continuation of the volume “Cyber Ethics 4.0” published in 2018 by the same editors
Recommended from our members
Sonic heritage: listening to the past
History is so often told through objects, images and photographs, but the potential of sounds to reveal place and space is often neglected. Our research project ‘Sonic Palimpsest’1 explores the potential of sound to evoke impressions and new understandings of the past, to embrace the sonic as a tool to understand what was, in a way that can complement and add to our predominant visual understandings. Our work includes the expansion of the Oral History archives held at Chatham Dockyard to include women’s voices and experiences, and the creation of sonic works to engage the public with their heritage. Our research highlights the social and cultural value of oral history and field recordings in the transmission of knowledge to both researchers and the public. Together these recordings document how buildings and spaces within the dockyard were used and experienced by those who worked there. We can begin to understand the social and cultural roles of these buildings within the community, both past and present
Lifelong Learning in the Clinical Open World
Despite mounting evidence that data drift causes deep learning models to deteriorate over time, the majority of medical imaging research is developed for - and evaluated on - static close-world environments. There have been exciting advances in the automatic detection and segmentation of diagnostically-relevant findings. Yet the few studies that attempt to validate their performance in actual clinics are met with disappointing results and little utility as perceived by healthcare professionals. This is largely due to the many factors that introduce shifts in medical image data distribution, from changes in the acquisition practices to naturally occurring variations in the patient population and disease manifestation. If we truly wish to leverage deep learning technologies to alleviate the workload of clinicians and drive forward the democratization of health care, we must move away from close-world assumptions and start designing systems for the dynamic open world.
This entails, first, the establishment of reliable quality assurance mechanisms with methods from the fields of uncertainty estimation, out-of-distribution detection, and domain-aware prediction appraisal. Part I of the thesis summarizes my contributions to this area. I first propose two approaches that identify outliers by monitoring a self-supervised objective or by quantifying the distance to training samples in a low-dimensional latent space. I then explore how to maximize the diversity among members of a deep ensemble for improved calibration and robustness; and present a lightweight method to detect low-quality lung lesion segmentation masks using domain knowledge.
Of course, detecting failures is only the first step. We ideally want to train models that are reliable in the open world for a large portion of the data. Out-of-distribution generalization and domain adaptation may increase robustness, but only to a certain extent. As time goes on, models can only maintain acceptable performance if they continue learning with newly acquired cases that reflect changes in the data distribution. The goal of continual learning is to adapt to changes in the environment without forgetting previous knowledge. One practical strategy to approach this is expansion, whereby multiple parametrizations of the model are trained and the most appropriate one is selected during inference. In the second part of the thesis, I present two expansion-based methods that do not rely on information regarding when or how the data distribution changes.
Even when appropriate mechanisms are in place to fail safely and accumulate knowledge over time, this will only translate to clinical usage insofar as the regulatory framework allows it. Current regulations in the USA and European Union only authorize locked systems that do not learn post-deployment. Fortunately, regulatory bodies are noting the need for a modern lifecycle regulatory approach. I review these efforts, along with other practical aspects of developing systems that learn through their lifecycle, in the third part of the thesis.
We are finally at a stage where healthcare professionals and regulators are embracing deep learning. The number of commercially available diagnostic radiology systems is also quickly rising. This opens up our chance - and responsibility - to show that these systems can be safe and effective throughout their lifespan
Semantic Data Management in Data Lakes
In recent years, data lakes emerged as away to manage large amounts of
heterogeneous data for modern data analytics. One way to prevent data lakes
from turning into inoperable data swamps is semantic data management. Some
approaches propose the linkage of metadata to knowledge graphs based on the
Linked Data principles to provide more meaning and semantics to the data in the
lake. Such a semantic layer may be utilized not only for data management but
also to tackle the problem of data integration from heterogeneous sources, in
order to make data access more expressive and interoperable. In this survey, we
review recent approaches with a specific focus on the application within data
lake systems and scalability to Big Data. We classify the approaches into (i)
basic semantic data management, (ii) semantic modeling approaches for enriching
metadata in data lakes, and (iii) methods for ontologybased data access. In
each category, we cover the main techniques and their background, and compare
latest research. Finally, we point out challenges for future work in this
research area, which needs a closer integration of Big Data and Semantic Web
technologies
Machine Learning Algorithm for the Scansion of Old Saxon Poetry
Several scholars designed tools to perform the automatic scansion of poetry in many languages, but none of these tools
deal with Old Saxon or Old English. This project aims to be a first attempt to create a tool for these languages. We
implemented a Bidirectional Long Short-Term Memory (BiLSTM) model to perform the automatic scansion of Old Saxon
and Old English poems. Since this model uses supervised learning, we manually annotated the Heliand manuscript, and
we used the resulting corpus as labeled dataset to train the model. The evaluation of the performance of the algorithm
reached a 97% for the accuracy and a 99% of weighted average for precision, recall and F1 Score. In addition, we tested
the model with some verses from the Old Saxon Genesis and some from The Battle of Brunanburh, and we observed that
the model predicted almost all Old Saxon metrical patterns correctly misclassified the majority of the Old English input
verses
Architecture and Advanced Electronics Pathways Toward Highly Adaptive Energy- Efficient Computing
With the explosion of the number of compute nodes, the bottleneck of future computing systems lies in the network architecture connecting the nodes. Addressing the bottleneck requires replacing current backplane-based network topologies. We propose to revolutionize computing electronics by realizing embedded optical waveguides for onboard networking and wireless chip-to-chip links at 200-GHz carrier frequency connecting neighboring boards in a rack. The control of novel rate-adaptive optical and mm-wave transceivers needs tight interlinking with the system software for runtime resource management
- …