13 research outputs found

    Optimized Seamless Integration of Biomolecular Data

    Get PDF
    Today, scientific data is inevitably digitized, stored in a wide variety of heterogeneous formats, and is accessible over the Internet. Scientists need to access an integrated view of multiple remote or local heterogeneous data sources. They then integrate the results of complex queries and apply further analysis and visualization to support the task of scientific discovery. Building such a digital library for scientific discovery requires accessing and manipulating data extracted from flat files or databases, documents retrieved from the Web, as well as data that is locally materialized in warehouses or is generated by software. We consider several tasks to provide optimized and seamless integration of biomolecular data. Challenges to be addressed include capturing and representing source capabilities; developing a methodology to acquire and represent semantic knowledge and metadata about source contents, overlap in source contents, and access costs; and decision support to select sources and capabilities using cost based and semantic knowledge, and generating low cost query evaluation plans. (Also referenced as UMIACS-TR-2001-51

    Handling Failures in Data Quality Measures

    Get PDF
    Successful data quality (DQ) measure is importantfor many data consumers (or data guardians) to decide on theacceptability of data of concerned. Nevertheless, little is knownabout how “failures” of DQ measures can be handled by dataguardians in the presence of factor(s) that contributes to thefailures. This paper presents a review of failure handling mechanismsfor DQ measures. The failure factors faced by existing DQmeasures will be presented, together with the research gaps inrespect to failure handling mechanisms in DQ frameworks. Inparticular, by comparing existing DQ frameworks in terms of: theinputs used to measure DQ, the way DQ scores are computed andthey way DQ scores are stored, we identified failure factorsinherent within the frameworks. Understanding of how failurescan be handled will lead to the design of a systematic failurehandling mechanism for robust DQ measures

    Handling Failures in Data Quality Measures

    Get PDF
    Successful data quality (DQ) measure is important for many data consumers (or data guardians) to decide on the acceptability of data of concerned. Nevertheless, little is known about how “failures” of DQ measures can be handled by data guardians in the presence of factor(s) that contributes to the failures. This paper presents a review of failure handling mechanisms for DQ measures. The failure factors faced by existing DQ measures will be presented, together with the research gaps in respect to failure handling mechanisms in DQ frameworks. We propose ways to maximise the situations in which data quality scores can be produced when factors that would cause the failure of currently proposed scoring mechanisms are present. By understanding how failures can be handled, a systematic failure handling mechanism for robust DQ measures can be designed

    Building Intelligent Web Applications Using Lightweight Wrappers

    Get PDF
    The Web so far has been incredibly successful at delivering information to human users. So successful actually, that there is now an urgent need to go beyond a browsing human. Unfortunately, the Web is not yet a well organized repository of nicely structured documents but rather a conglomerate of volatile HTML pages. To address this problem, we present the World Wide Web Wrapper Factory (W4F), a toolkit for the generation of wrappers for Web sources, that offers: (1) an expressive language to specify the extraction of complex structures from HTML pages; (2) a declarative mapping to various data formats like XML; (3) some visual tools to make the engineering of wrappers faster and easier

    Efficient Feedback Collection for Pay-as-you-go Source Selection

    Get PDF
    Article No. 1International audienceTechnical developments, such as the web of data and web data extraction, combined with policy developments such as those relating to open government or open science, are leading to the availability of increasing numbers of data sources. Indeed, given these physical sources, it is then also possible to create further virtual sources that integrate, aggregate or summarise the data from the original sources. As a result, there is a plethora of data sources, from which a small subset may be able to provide the information required to support a task. The number and rate of change in the available sources is likely to make manual source selection and curation by experts impractical for many applications, leading to the need to pursue a pay-as-you-go approach, in which crowds or data consumers annotate results based on their correctness or suitability, with the resulting annotations used to inform, e.g., source selection algorithms. However, for pay-as-you-go feedback collection to be cost-effective, it may be necessary to select judiciously the data items on which feedback is to be obtained. This paper describes OLBP (Ordering and Labelling By Precision), a heuristics-based approach to the targeting of data items for feedback to support mapping and source selection tasks, where users express their preferences in terms of the trade-off between precision and recall. The proposed approach is then evaluated on two different scenarios, mapping selection with synthetic data, and source selection with real data produced by web data extraction. The results demonstrate a significant reduction in the amount of feedback required to reach user-provided objectives when using OLBP

    WISM'07 : 4th international workshop on web information systems modeling

    Get PDF

    WISM'07 : 4th international workshop on web information systems modeling

    Get PDF
    corecore