110 research outputs found

    Wrapper Maintenance: A Machine Learning Approach

    Full text link
    The proliferation of online information sources has led to an increased use of wrappers for extracting data from Web sources. While most of the previous research has focused on quick and efficient generation of wrappers, the development of tools for wrapper maintenance has received less attention. This is an important research problem because Web sources often change in ways that prevent the wrappers from extracting data correctly. We present an efficient algorithm that learns structural information about data from positive examples alone. We describe how this information can be used for two wrapper maintenance applications: wrapper verification and reinduction. The wrapper verification system detects when a wrapper is not extracting correct data, usually because the Web source has changed its format. The reinduction algorithm automatically recovers from changes in the Web source by identifying data on Web pages so that a new wrapper may be generated for this source. To validate our approach, we monitored 27 wrappers over a period of a year. The verification algorithm correctly discovered 35 of the 37 wrapper changes, and made 16 mistakes, resulting in precision of 0.73 and recall of 0.95. We validated the reinduction algorithm on ten Web sources. We were able to successfully reinduce the wrappers, obtaining precision and recall values of 0.90 and 0.80 on the data extraction task

    On the Mental Workload Assessment of Uplift Mapping Representations in Linked Data

    Get PDF
    Self-reporting procedures have been largely employed in literature to measure the mental workload experienced by users when executing a specific task. This research proposes the adoption of these mental workload assessment techniques to the task of creating uplift mappings in Linked Data. A user study has been performed to compare the mental workload of “manually” creating such mappings, using a formal mapping language and a text editor, to the use of a visual representation, based on the block metaphor, that generate these mappings. Two subjective mental workload instruments, namely the NASA Task Load Index and the Workload Profile, were applied in this study. Preliminary results show the reliability of these instruments in measuring the perceived mental workload for the task of creating uplift mappings. Results also indicate that participants using the visual representation achieved smaller and more consistent scores of mental workload

    Scaling up Planning by teasing out Resource Scheduling

    No full text
    . Planning consists of an action selection phase where actions are selected and ordered to reach the desired goals, and a resource allocation phase where enough resources are assigned to ensure the successful execution of the chosen actions. In most real-world problems, these two phases are loosely coupled. Most existing planners do not exploit this loose-coupling, and perform both action selection and resource assignment employing the same algorithm. We shall show that this strategy severely curtails the scale-up potential of existing planners, including such recent ones as Graphplan and Blackbox. In response, we propose a novel planning framework in which resource allocation is teased apart from planning, and is handled in a separate "scheduling" phase. We ignore resource constraints during planning and produce an abstract plan that can correctly achieve the goals but for the resource constraints. Next, based on the actual resource availability, the abstract plan will be ..

    Networks for Autonomous Formation Flying Satellite Systems

    No full text
    The performance of three communications networks to support autonomous multi-spacecraft formation flying systems is presented. All systems are comprised of a ten-satellite formation arranged in a star topology, with one of the satellites designated as the central or "mother ship." All data is routed through the mother ship to the terrestrial network. The first system uses a TCP/lP over ATM protocol architecture within the formation the second system uses the IEEE 802.11 protocol architecture within the formation and the last system uses both of the previous architectures with a constellation of geosynchronous satellites serving as an intermediate point-of-contact between the formation and the terrestrial network. The simulations consist of file transfers using either the File Transfer Protocol (FTP) or the Simple Automatic File Exchange (SAFE) Protocol. The results compare the IF queuing delay, and IP processing delay at the mother ship as well as application-level round-trip time for both systems, In all cases, using IEEE 802.11 within the formation yields less delay. Also, the throughput exhibited by SAFE is better than FTP
    corecore