178 research outputs found

    Engineering Agile Big-Data Systems

    Get PDF
    To be effective, data-intensive systems require extensive ongoing customisation to reflect changing user requirements, organisational policies, and the structure and interpretation of the data they hold. Manual customisation is expensive, time-consuming, and error-prone. In large complex systems, the value of the data can be such that exhaustive testing is necessary before any new feature can be added to the existing design. In most cases, the precise details of requirements, policies and data will change during the lifetime of the system, forcing a choice between expensive modification and continued operation with an inefficient design.Engineering Agile Big-Data Systems outlines an approach to dealing with these problems in software and data engineering, describing a methodology for aligning these processes throughout product lifecycles. It discusses tools which can be used to achieve these goals, and, in a number of case studies, shows how the tools and methodology have been used to improve a variety of academic and business systems

    Engineering Agile Big-Data Systems

    Get PDF
    To be effective, data-intensive systems require extensive ongoing customisation to reflect changing user requirements, organisational policies, and the structure and interpretation of the data they hold. Manual customisation is expensive, time-consuming, and error-prone. In large complex systems, the value of the data can be such that exhaustive testing is necessary before any new feature can be added to the existing design. In most cases, the precise details of requirements, policies and data will change during the lifetime of the system, forcing a choice between expensive modification and continued operation with an inefficient design.Engineering Agile Big-Data Systems outlines an approach to dealing with these problems in software and data engineering, describing a methodology for aligning these processes throughout product lifecycles. It discusses tools which can be used to achieve these goals, and, in a number of case studies, shows how the tools and methodology have been used to improve a variety of academic and business systems

    Research in Context: Výzkum v kontextu

    Get PDF
    Znalost kontextu vědeckých výsledků pomáhá vylepšovat vyhledávání, sdílení a opětovné užití výsledků výzkumu uživatelům v knihovnách i vědeckých komunitách. OpenAIREplus usnadňuje přístup k výsledkům evropskému výzkumu. Jedním z cílů OpenAIRE plus je snaha propojovat výzkumné publikace s daty z výzkumu a informacemi o autorech a grantech. Příspěvek ukáže na demonstrativním příkladu z oblasti sociálních věd, jak je možné využívat přidružené publikace vědeckých výstupů (tzv. enhanced publications).Context of scholarly results improves discovery, sharing and re-use in library and research communities. OpenAIREplus facilitates access to European research. Its mission is to interlink research publications, data, contributors and grants. Introducing pilots for the Social & Life Sciences we show how disciplinary services can be used to enhanced publications in cross-disciplinary environments. The approach followed originate from text-mining and interlinks innovative repository services and initiatives like DataCite

    Linked Data Quality Assessment and its Application to Societal Progress Measurement

    Get PDF
    In recent years, the Linked Data (LD) paradigm has emerged as a simple mechanism for employing the Web as a medium for data and knowledge integration where both documents and data are linked. Moreover, the semantics and structure of the underlying data are kept intact, making this the Semantic Web. LD essentially entails a set of best practices for publishing and connecting structure data on the Web, which allows publish- ing and exchanging information in an interoperable and reusable fashion. Many different communities on the Internet such as geographic, media, life sciences and government have already adopted these LD principles. This is confirmed by the dramatically growing Linked Data Web, where currently more than 50 billion facts are represented. With the emergence of Web of Linked Data, there are several use cases, which are possible due to the rich and disparate data integrated into one global information space. Linked Data, in these cases, not only assists in building mashups by interlinking heterogeneous and dispersed data from multiple sources but also empowers the uncovering of meaningful and impactful relationships. These discoveries have paved the way for scientists to explore the existing data and uncover meaningful outcomes that they might not have been aware of previously. In all these use cases utilizing LD, one crippling problem is the underlying data quality. Incomplete, inconsistent or inaccurate data affects the end results gravely, thus making them unreliable. Data quality is commonly conceived as fitness for use, be it for a certain application or use case. There are cases when datasets that contain quality problems, are useful for certain applications, thus depending on the use case at hand. Thus, LD consumption has to deal with the problem of getting the data into a state in which it can be exploited for real use cases. The insufficient data quality can be caused either by the LD publication process or is intrinsic to the data source itself. A key challenge is to assess the quality of datasets published on the Web and make this quality information explicit. Assessing data quality is particularly a challenge in LD as the underlying data stems from a set of multiple, autonomous and evolving data sources. Moreover, the dynamic nature of LD makes assessing the quality crucial to measure the accuracy of representing the real-world data. On the document Web, data quality can only be indirectly or vaguely defined, but there is a requirement for more concrete and measurable data quality metrics for LD. Such data quality metrics include correctness of facts wrt. the real-world, adequacy of semantic representation, quality of interlinks, interoperability, timeliness or consistency with regard to implicit information. Even though data quality is an important concept in LD, there are few methodologies proposed to assess the quality of these datasets. Thus, in this thesis, we first unify 18 data quality dimensions and provide a total of 69 metrics for assessment of LD. The first methodology includes the employment of LD experts for the assessment. This assessment is performed with the help of the TripleCheckMate tool, which was developed specifically to assist LD experts for assessing the quality of a dataset, in this case DBpedia. The second methodology is a semi-automatic process, in which the first phase involves the detection of common quality problems by the automatic creation of an extended schema for DBpedia. The second phase involves the manual verification of the generated schema axioms. Thereafter, we employ the wisdom of the crowds i.e. workers for online crowdsourcing platforms such as Amazon Mechanical Turk (MTurk) to assess the quality of DBpedia. We then compare the two approaches (previous assessment by LD experts and assessment by MTurk workers in this study) in order to measure the feasibility of each type of the user-driven data quality assessment methodology. Additionally, we evaluate another semi-automated methodology for LD quality assessment, which also involves human judgement. In this semi-automated methodology, selected metrics are formally defined and implemented as part of a tool, namely R2RLint. The user is not only provided the results of the assessment but also specific entities that cause the errors, which help users understand the quality issues and thus can fix them. Finally, we take into account a domain-specific use case that consumes LD and leverages on data quality. In particular, we identify four LD sources, assess their quality using the R2RLint tool and then utilize them in building the Health Economic Research (HER) Observatory. We show the advantages of this semi-automated assessment over the other types of quality assessment methodologies discussed earlier. The Observatory aims at evaluating the impact of research development on the economic and healthcare performance of each country per year. We illustrate the usefulness of LD in this use case and the importance of quality assessment for any data analysis

    Harmonizing and publishing heterogeneous premodern manuscript metadata as Linked Open Data

    Get PDF
    Manuscripts are a crucial form of evidence for research into all aspects of premodern European history and culture, and there are numerous databases devoted to describing them in detail. This descriptive information, however, is typically available only in separate data silos based on incompatible data models and user interfaces. As a result, it has been difficult to study manuscripts comprehensively across these various platforms. To address this challenge, a team of manuscript scholars and computer scientists worked to create "Mapping Manuscript Migrations" (MMM), a semantic portal, and a Linked Open Data service. MMM stands as a successful proof of concept for integrating distinct manuscript datasets into a shared platform for research and discovery with the potential for future expansion. This paper will discuss the major products of the MMM project: a unified data model, a repeatable data transformation pipeline, a Linked Open Data knowledge graph, and a Semantic Web portal. It will also examine the crucial importance of an iterative process of multidisciplinary collaboration embedded throughout the project, enabling humanities researchers to shape the development of a digital platform and tools, while also enabling the same researchers to ask more sophisticated and comprehensive research questions of the aggregated data.Peer reviewe

    Knowledge Annotation within Research Data Management System for Oxygen-Free Production Technologies

    Get PDF
    The comprehensive implementation of digital technologies in product manufacturing leads to changes in engineering processes and requires new approaches to data management. An important role belongs to the processes of organizing the collection, storage and reuse of research data obtained and used in the process of product, system or technology development, taking into account the FAIR data principles. This article describes a Research Data Management System for the organization of documentation and measurement requests in the research and development of new oxygen-free production technologies

    Metadata stewardship in nanosafety research: learning from the past, preparing for an "on-the-fly" FAIR future

    Get PDF
    Introduction: Significant progress has been made in terms of best practice in research data management for nanosafety. Some of the underlying approaches to date are, however, overly focussed on the needs of specific research projects or aligned to a single data repository, and this "silo" approach is hampering their general adoption by the broader research community and individual labs.Methods: State-of-the-art data/knowledge collection, curation management FAIrification, and sharing solutions applied in the nanosafety field are reviewed focusing on unique features, which should be generalised and integrated into a functional FAIRification ecosystem that addresses the needs of both data generators and data (re)users.Results: The development of data capture templates has focussed on standardised single-endpoint Test Guidelines, which does not reflect the complexity of real laboratory processes, where multiple assays are interlinked into an overall study, and where non-standardised assays are developed to address novel research questions and probe mechanistic processes to generate the basis for read-across from one nanomaterial to another. By focussing on the needs of data providers and data users, we identify how existing tools and approaches can be re-framed to enable "on-the-fly" (meta) data definition, data capture, curation and FAIRification, that are sufficiently flexible to address the complexity in nanosafety research, yet harmonised enough to facilitate integration of datasets from different sources generated for different research purposes. By mapping the available tools for nanomaterials safety research (including nanomaterials characterisation, nonstandard (mechanistic-focussed) methods, measurement principles and experimental setup, environmental fate and requirements from new research foci such as safe and sustainable by design), a strategy for integration and bridging between silos is presented. The NanoCommons KnowledgeBase has shown how data from different sources can be integrated into a one-stop shop for searching, browsing and accessing data (without copying), and thus how to break the boundaries between data silos.Discussion: The next steps are to generalise the approach by defining a process to build consensus (meta)data standards, develop solutions to make (meta)data more machine actionable (on the fly ontology development) and establish a distributed FAIR data ecosystem maintained by the community beyond specific projects. Since other multidisciplinary domains might also struggle with data silofication, the learnings presented here may be transferrable to facilitate data sharing within other communities and support harmonization of approaches across disciplines to prepare the ground for cross-domain interoperability

    Metadata stewardship in nanosafety research: learning from the past, preparing for an "on-the-fly" FAIR future

    Get PDF
    Introduction: Significant progress has been made in terms of best practice in research data management for nanosafety. Some of the underlying approaches to date are, however, overly focussed on the needs of specific research projects or aligned to a single data repository, and this “silo” approach is hampering their general adoption by the broader research community and individual labs. Methods: State-of-the-art data/knowledge collection, curation management FAIRification, and sharing solutions applied in the nanosafety field are reviewed focusing on unique features, which should be generalised and integrated into a functional FAIRification ecosystem that addresses the needs of both data generators and data (re)users. Results: The development of data capture templates has focussed on standardised single-endpoint Test Guidelines, which does not reflect the complexity of real laboratory processes, where multiple assays are interlinked into an overall study, and where non-standardised assays are developed to address novel research questions and probe mechanistic processes to generate the basis for read-across from one nanomaterial to another. By focussing on the needs of data providers and data users, we identify how existing tools and approaches can be re-framed to enable “on-the-fly” (meta) data definition, data capture, curation and FAIRification, that are sufficiently flexible to address the complexity in nanosafety research, yet harmonised enough to facilitate integration of datasets from different sources generated for different research purposes. By mapping the available tools for nanomaterials safety research (including nanomaterials characterisation, non-standard (mechanistic-focussed) methods, measurement principles and experimental setup, environmental fate and requirements from new research foci such as safe and sustainable by design), a strategy for integration and bridging between silos is presented. The NanoCommons KnowledgeBase has shown how data from different sources can be integrated into a one-stop shop for searching, browsing and accessing data (without copying), and thus how to break the boundaries between data silos. Discussion: The next steps are to generalise the approach by defining a process to build consensus (meta)data standards, develop solutions to make (meta)data more machine actionable (on the fly ontology development) and establish a distributed FAIR data ecosystem maintained by the community beyond specific projects. Since other multidisciplinary domains might also struggle with data silofication, the learnings presented here may be transferable to facilitate data sharing within other communities and support harmonization of approaches across disciplines to prepare the ground for cross-domain interoperability. Visit WorldFAIR online at http://worldfair-project.eu. WorldFAIR is funded by the EC HORIZON-WIDERA-2021-ERA-01-41 Coordination and Support Action under Grant Agreement No. 101058393
    corecore