178 research outputs found

    Propelling the Potential of Enterprise Linked Data in Austria. Roadmap and Report

    Get PDF
    In times of digital transformation and considering the potential of the data-driven economy, it is crucial that data is not only made available, data sources can be trusted, but also data integrity can be guaranteed, necessary privacy and security mechanisms are in place, and data and access comply with policies and legislation. In many cases, complex and interdisciplinary questions cannot be answered by a single dataset and thus it is necessary to combine data from multiple disparate sources. However, because most data today is locked up in isolated silos, data cannot be used to its fullest potential. The core challenge for most organisations and enterprises in regards to data exchange and integration is to be able to combine data from internal and external data sources in a manner that supports both day to day operations and innovation. Linked Data is a promising data publishing and integration paradigm that builds upon standard web technologies. It supports the publishing of structured data in a semantically explicit and interlinked manner such that it can be easily connected, and consequently becomes more interoperable and useful. The PROPEL project - Propelling the Potential of Enterprise Linked Data in Austria - surveyed technological challenges, entrepreneurial opportunities, and open research questions on the use of Linked Data in a business context and developed a roadmap and a set of recommendations for policy makers, industry, and the research community. Shifting away from a predominantly academic perspective and an exclusive focus on open data, the project looked at Linked Data as an emerging disruptive technology that enables efficient enterprise data management in the rising data economy. Current market forces provide many opportunities, but also present several data and information management challenges. Given that Linked Data enables advanced analytics and decision-making, it is particularly suitable for addressing today's data and information management challenges. In our research, we identified a variety of highly promising use cases for Linked Data in an enterprise context. Examples of promising application domains include "customization and customer relationship management", "automatic and dynamic content production, adaption and display", "data search, information retrieval and knowledge discovery", as well as "data and information exchange and integration". The analysis also revealed broad potential across a large spectrum of industries whose structural and technological characteristics align well with Linked Data characteristics and principles: energy, retail, finance and insurance, government, health, transport and logistics, telecommunications, media, tourism, engineering, and research and development rank among the most promising industries for the adoption of Linked Data principles. In addition to approaching the subject from an industry perspective, we also examined the topics and trends emerging from the research community in the field of Linked Data and the Semantic Web. Although our analysis revolved around a vibrant and active community composed of academia and leading companies involved in semantic technologies, we found that industry needs and research discussions are somewhat misaligned. Whereas some foundation technologies such as knowledge representation and data creation/publishing/sharing, data management and system engineering are highly represented in scientific papers, specific topics such as recommendations, or cross-topics such as machine learning or privacy and security are marginally present. Topics such as big/large data and the internet of things are (still) on an upward trajectory in terms of attention. In contrast, topics that are very relevant for industry such as application oriented topics or those that relate to security, privacy and robustness are not attracting much attention. When it comes to standardisation efforts, we identified a clear need for a more in-depth analysis into the effectiveness of existing standards, the degree of coverage they provide with respect the foundations they belong to, and the suitability of alternative standards that do not fall under the core Semantic Web umbrella. Taking into consideration market forces, sector analysis of Linked Data potential, demand side analysis and the current technological status it is clear that Linked Data has a lot of potential for enterprises and can act as a key driver of technological, organizational, and economic change. However, in order to ensure a solid foundation for Enterprise Linked Data include there is a need for: greater awareness surrounding the potential of Linked Data in enterprises, lowering of entrance barriers via education and training, better alignment between industry demands and research activities, greater support for technology transfer from universities to companies. The PROPEL roadmap recommends concrete measures in order to propel the adoption of Linked Data in Austrian enterprises. These measures are structured around five fields of activities: "awareness and education", "technological innovation, research gaps, standardisation", "policy and legal", and "funding". Key short-term recommendations include the clustering of existing activities in order to raise visibility on an international level, the funding of key topics that are under represented by the community, and the setup of joint projects. In the medium term, we recommend the strengthening of existing academic and private education efforts via certification and to establish flagship projects that are based on national use cases that can serve as blueprints for transnational initiatives. This requires not only financial support, but also infrastructure support, such as data and services to build solutions on top. In the long term, we recommend cooperation with international funding schemes to establish and foster a European level agenda, and the setup of centres of excellence

    Distributed learning: Developing a predictive model based on data from multiple hospitals without data leaving the hospital – A real life proof of concept

    Get PDF
    AbstractPurposeOne of the major hurdles in enabling personalized medicine is obtaining sufficient patient data to feed into predictive models. Combining data originating from multiple hospitals is difficult because of ethical, legal, political, and administrative barriers associated with data sharing. In order to avoid these issues, a distributed learning approach can be used. Distributed learning is defined as learning from data without the data leaving the hospital.Patients and methodsClinical data from 287 lung cancer patients, treated with curative intent with chemoradiation (CRT) or radiotherapy (RT) alone were collected from and stored in 5 different medical institutes (123 patients at MAASTRO (Netherlands, Dutch), 24 at Jessa (Belgium, Dutch), 34 at Liege (Belgium, Dutch and French), 48 at Aachen (Germany, German) and 58 at Eindhoven (Netherlands, Dutch)).A Bayesian network model is adapted for distributed learning (watch the animation: http://youtu.be/nQpqMIuHyOk). The model predicts dyspnea, which is a common side effect after radiotherapy treatment of lung cancer.ResultsWe show that it is possible to use the distributed learning approach to train a Bayesian network model on patient data originating from multiple hospitals without these data leaving the individual hospital. The AUC of the model is 0.61 (95%CI, 0.51–0.70) on a 5-fold cross-validation and ranges from 0.59 to 0.71 on external validation sets.ConclusionDistributed learning can allow the learning of predictive models on data originating from multiple hospitals while avoiding many of the data sharing barriers. Furthermore, the distributed learning approach can be used to extract and employ knowledge from routine patient data from multiple hospitals while being compliant to the various national and European privacy laws

    Integration among databases and data sets to support productive nanotechnology: Challenges and recommendations

    Get PDF
    Many groups within the broad field of nanoinformatics are already developing data repositories and analytical tools driven by their individual organizational goals. Integrating these data resources across disciplines and with non-nanotechnology resources can support multiple objectives by enabling the reuse of the same information. Integration can also serve as the impetus for novel scientific discoveries by providing the framework to support deeper data analyses. This article discusses current data integration practices in nanoinformatics and in comparable mature fields, and nanotechnology-specific challenges impacting data integration. Based on results from a nanoinformatics-community-wide survey, recommendations for achieving integration of existing operational nanotechnology resources are presented. Nanotechnology-specific data integration challenges, if effectively resolved, can foster the application and validation of nanotechnology within and across disciplines. This paper is one of a series of articles by the Nanomaterial Data Curation Initiative that address data issues such as data curation workflows, data completeness and quality, curator responsibilities, and metadata

    Semantic data integration and knowledge graph creation at scale

    Get PDF
    Contrary to data, knowledge is often abstract. Concrete knowledge can be achieved through the inclusion of semantics in the data models, highlighting the role of data integration. The massive growing number of data, in recent years, has promoted the demand for scaling up data management techniques; materializing data integration, a.k.a., knowledge graph creation falls in that category. In this thesis, we investigate efficient methods and techniques for materializing data integration. We formalize the process of materializing data integration. We formally define the characteristics of a materialized data integration system that merge the data operators and sources. Owing to this formalism, both layers of data integration, including data and schema-level integration, are formalized in the context of mapping assertions. We explore optimization opportunities for improving the materialization of data integration systems. We recognize three angles including intra/inter-mapping assertions from which the materialization can be improved. Accordingly, we propose source-based, mapping-based, and inter-mapping assertion groups of optimization techniques. We utilize our proposed techniques in three real-world projects. We illustrate how applying these optimization techniques contribute to meeting the objectives of the mentioned projects. Furthermore, we study the parameters impacting the performance of materialization of data integration. Relying on reported parameters and the presumably impacting parameters, we build four groups of testbeds. We empirically study the performances of these different testbeds in the presence and absence of our proposed techniques, in terms of execution time. We observe that the savings can be up to 75%. Lastly, we contribute to facilitating the process of declarative data integration system definition. We propose two data operation function signatures in Function Ontology (FnO). The first set of functions is designed to perform the task of entity alignment by resorting to an entity and relation linking tool. The second library consists of domain-specific functions to align genomic entities by harmonizing their representations. Finally, we introduce a tool equipped with a user interface to facilitate the process of defining declarative mapping rules by allowing users to explore the data sources and unified schema while defining their correspondences.Im Gegensatz zu den Daten ist das Wissen oft abstrakt. Konkretes Wissen kann durch die Einbeziehung von Semantik in die Datenmodelle erreicht werden, was die Rolle der Datenintegration unterstreicht. Die massiv wachsende Zahl von Daten hat in den letzten Jahren die Nachfrage nach einer Ausweitung der Datenverwaltungstechnikengef¨ordert; die materialisierende Datenintegration, auch bekannt als die Erstellung von Wissensgraphen, f¨allt in diese Kategorie. In dieser Arbeit untersuchen wir effiziente Methoden und Techniken zur Materialisierung der Datenintegration. Wir formalisieren den Prozess der Materialisierung der Datenintegration. Wir definieren formal die Eigenschaften eines materialisierten Datenintegrationssystems, so dass die Datenoperatoren und -quellen zusammengef¨uhrt werden. Dank dieses Formalismus werden beide Ebenen der Datenintegration, einschließlich der Integration auf Daten- und Schemaebene, im Kontext von Mapping-Assertions formalisiert. Wir untersuchen die Optimierungsm¨oglichkeiten zur Verbesserung der Materialisierung von Datenintegrationssystemen. Wir erkennen drei Gesichtspunkte, einschließlich Intra-/Inter-Mapping-Assertions, unter denen die Materialisierung verbessert werden kann. Dementsprechend schlagen wir quellenbasierte, mappingbasierte und inter-mapping Assertionsgruppen von Optimierungstechniken vor. Wir setzen die von uns vorgeschlagenen Techniken in drei Forschungsprojekte ein. Wir veranschaulichen, wie die Anwendung dieser Optimierungstechniken dazu beitr¨agt, die Ziele der genannten Projekte zu erreichen. Wir untersuchen die Parameter, die sich auf die Leistung der Materialisierung der Datenintegration auswirken. Auf der Grundlage der gemeldeten Parameter und der vermutlich ausschlaggebenden Parameter erstellen wir vier Gruppen von Testumgebungen. Wir untersuchen empirisch die Leistung dieser verschiedenen Testbeds mit und ohne die von uns vorgeschlagenen Techniken in Bezug auf die Ausf¨uhrungszeit. Wir stellen fest, dass die Einsparungen bis zu 75% betragen k¨onnen. Schließlich tragen wir zur Erleichterung des Prozesses der deklarativen Definition von Datenintegrationssystemen bei, indem wir zwei Funktionssignaturen f¨ur Datenoperationen in der Function Ontology (FnO) vorschlagen. Die erste Gruppe von Funktionen ist f¨ur die Aufgabe des Entit¨atsabgleichs konzipiert, w¨ahrend die zweite Bibliothek aus dom¨anenspezifischen Funktionen zum Abgleich genomischer Entit¨aten durch Harmonisierung ihrer Darstellungen besteht. Schließlich stellen wir ein Tool vor, das mit einer Benutzeroberfl¨ache ausgestattet ist, um den Prozess der Definition deklarativer Mapping-Regeln zu erleichtern, indem es den Benutzern erm¨oglicht, die Datenquellen und das einheitliche Schema zu erkunden

    Storage, Querying and Inferencing for Semantic Web Languages

    Get PDF
    Harmelen, F.A.H. van [Promotor

    Relieving the cognitive load of constructing molecular biological ontology based queries by means of visual aids.

    Get PDF
    Thesis (M.Comp.Sc.)-Universty of KwaZulu-Natal, Pietermaritzburg, 2007.The domain of molecular biology is complex and vast. Bio-ontologies and information visualisation have arisen in recent years as means to assist biologists in making sense of this information. Ontologies can enable the construction of conceptual queries, but existing systems to do this are too technical for most biologists. OntoDas, the software developed as part of this thesis work, demonstrates how the application of techniques from information visualisation and human computer interaction can result in software which enables biologists to construct conceptual queries

    Cross-Platform Text Mining and Natural Language Processing Interoperability - Proceedings of the LREC2016 conference

    Get PDF
    No abstract available
    corecore