67,049 research outputs found

    Measuring the Quality of an Integrated Schema

    Full text link

    Determining the Quality of Product Data Integration

    Get PDF
    To meet customer demands, companies must manage numerous variants and versions of their products. Since product-related data (e.g., requirements' specifications, geometric models, and source code, or test cases) are usually scattered over a large number of heterogeneous, autonomous information systems, their integration becomes crucial when developing complex products on one hand and aiming at reduced development costs on the other. In general, product data are created in different stages of the product development process. Furthermore, they should be integrated in a complete and consistent way at certain milestones during process development (e.g., prototype construction). Usually, this data integration process is accomplished manually, which is both costly and error prone. Instead semi-automated product data integration is required meeting the data quality requirements of the various stages during product development. In turn, this necessitates a close monitoring of the progress of the data integration process based on proper metrics. Contemporary approaches solely focus on metrics assessing schema integration, while not measuring the quality and progress of data integration. This paper elicits fundamental requirements relevant in this context. Based on them, we develop appropriate metrics for measuring product data quality and apply them in a case study we conducted at an automotive original equipment manufacturer

    Qualitative Effects of Knowledge Rules in Probabilistic Data Integration

    Get PDF
    One of the problems in data integration is data overlap: the fact that different data sources have data on the same real world entities. Much development time in data integration projects is devoted to entity resolution. Often advanced similarity measurement techniques are used to remove semantic duplicates from the integration result or solve other semantic conflicts, but it proofs impossible to get rid of all semantic problems in data integration. An often-used rule of thumb states that about 90% of the development effort is devoted to solving the remaining 10% hard cases. In an attempt to significantly decrease human effort at data integration time, we have proposed an approach that stores any remaining semantic uncertainty and conflicts in a probabilistic database enabling it to already be meaningfully used. The main development effort in our approach is devoted to defining and tuning knowledge rules and thresholds. Rules and thresholds directly impact the size and quality of the integration result. We measure integration quality indirectly by measuring the quality of answers to queries on the integrated data set in an information retrieval-like way. The main contribution of this report is an experimental investigation of the effects and sensitivity of rule definition and threshold tuning on the integration quality. This proves that our approach indeed reduces development effort — and not merely shifts the effort to rule definition and threshold tuning — by showing that setting rough safe thresholds and defining only a few rules suffices to produce a ‘good enough’ integration that can be meaningfully used

    A schema-based model of situation awareness: Implications for measuring situation awareness

    Get PDF
    Measures of pilot situation awareness (SA) are needed in order to know whether new concepts in display design help pilots keep track of rapidly changing tactical situations. In order to measure SA, a theory of situation assessment is needed. Such a theory is summarized, encompassing both a definition of SA and a model of situation assessment. SA is defined as the pilot's knowledge about a zone of interest at a given level of abstraction. Pilots develop this knowledge by sampling data from the environment and matching the sampled data to knowledge structures stored in long-term memory. Matched knowledge structures then provide the pilot's assessment of the situation and serve to guide his attention. A number of cognitive biases that result from the knowledge matching process are discussed, as are implications for partial report measures of situation awareness

    Polarization Imperfections of Light in Interferometry

    Get PDF
    Disertační práce pojednává o polarizačních nedokonalostech optických komponentů, které jsou využívány ke kontrole a k transformaci polarizačního stavu světla. Získané teoretické výsledky jsou pak využity ve vybraných aplikacích, jež ke své činnosti využívají právě polarizace světla. Konkrétně se jedná o zařízení měřící vibrace oscilujících objektů, dále o interferenční měření dvojlomu v transparentních materiálech a konečně, o vybraná témata z optické kvantové komunikace.The emphasis of the dissertation is put on the investigating of polarization imperfections of optical components which are used to control and transform polarization of light. The theoretical results of this investigation are then applied to different applications which exploit light polarization, namely to the arrangements for high-resolution measurement of vibrating targets, to interferometric measurements for the determination of stress-induced birefringence in transparent materials and to the selected topics in quantum optical communication.

    A global approach to digital library evaluation towards quality interoperability

    Get PDF
    This paper describes some of the key research works related to my PhD thesis. The goal is the development of a global approach to digital library (DL) evaluation towards quality interoperability. DL evaluation has a vital role to play in building DLs, and in understanding and enhancing their role in society. Responding to two parallel research needs, the project is grouped around two tracks. Track one covers the theoretical approach, and provides an integrated evaluation model which overcomes the fragmentation of quality assessments; track two covers the experimental side, which has been undertaken through a comparative analysis of different DL evaluation methodologies, relating them to the conceptual framework. After presenting the problem dentition, current background and related work, this paper enumerates a set of research questions and hypotheses that I would like to address, and outlines the research methodology, focusing on a proposed evaluation framework and on the lessons learned from the case studies

    XML Schema Clustering with Semantic and Hierarchical Similarity Measures

    Get PDF
    With the growing popularity of XML as the data representation language, collections of the XML data are exploded in numbers. The methods are required to manage and discover the useful information from them for the improved document handling. We present a schema clustering process by organising the heterogeneous XML schemas into various groups. The methodology considers not only the linguistic and the context of the elements but also the hierarchical structural similarity. We support our findings with experiments and analysis

    Assessment of sensor performance

    Get PDF
    There is an international commitment to develop a comprehensive, coordinated and sustained ocean observation system. However, a foundation for any observing, monitoring or research effort is effective and reliable in situ sensor technologies that accurately measure key environmental parameters. Ultimately, the data used for modelling efforts, management decisions and rapid responses to ocean hazards are only as good as the instruments that collect them. There is also a compelling need to develop and incorporate new or novel technologies to improve all aspects of existing observing systems and meet various emerging challenges. Assessment of Sensor Performance was a cross-cutting issues session at the international OceanSensors08 workshop in Warnemünde, Germany, which also has penetrated some of the papers published as a result of the workshop (Denuault, 2009; Kröger et al., 2009; Zielinski et al., 2009). The discussions were focused on how best to classify and validate the instruments required for effective and reliable ocean observations and research. The following is a summary of the discussions and conclusions drawn from this workshop, which specifically addresses the characterisation of sensor systems, technology readiness levels, verification of sensor performance and quality management of sensor systems

    Integrating and Ranking Uncertain Scientific Data

    Get PDF
    Mediator-based data integration systems resolve exploratory queries by joining data elements across sources. In the presence of uncertainties, such multiple expansions can quickly lead to spurious connections and incorrect results. The BioRank project investigates formalisms for modeling uncertainty during scientific data integration and for ranking uncertain query results. Our motivating application is protein function prediction. In this paper we show that: (i) explicit modeling of uncertainties as probabilities increases our ability to predict less-known or previously unknown functions (though it does not improve predicting the well-known). This suggests that probabilistic uncertainty models offer utility for scientific knowledge discovery; (ii) small perturbations in the input probabilities tend to produce only minor changes in the quality of our result rankings. This suggests that our methods are robust against slight variations in the way uncertainties are transformed into probabilities; and (iii) several techniques allow us to evaluate our probabilistic rankings efficiently. This suggests that probabilistic query evaluation is not as hard for real-world problems as theory indicates
    corecore