80,403 research outputs found

    Wrapper Maintenance: A Machine Learning Approach

    Full text link
    The proliferation of online information sources has led to an increased use of wrappers for extracting data from Web sources. While most of the previous research has focused on quick and efficient generation of wrappers, the development of tools for wrapper maintenance has received less attention. This is an important research problem because Web sources often change in ways that prevent the wrappers from extracting data correctly. We present an efficient algorithm that learns structural information about data from positive examples alone. We describe how this information can be used for two wrapper maintenance applications: wrapper verification and reinduction. The wrapper verification system detects when a wrapper is not extracting correct data, usually because the Web source has changed its format. The reinduction algorithm automatically recovers from changes in the Web source by identifying data on Web pages so that a new wrapper may be generated for this source. To validate our approach, we monitored 27 wrappers over a period of a year. The verification algorithm correctly discovered 35 of the 37 wrapper changes, and made 16 mistakes, resulting in precision of 0.73 and recall of 0.95. We validated the reinduction algorithm on ten Web sources. We were able to successfully reinduce the wrappers, obtaining precision and recall values of 0.90 and 0.80 on the data extraction task

    Automating the IEEE std. 1500 compliance verification for embedded cores

    Get PDF
    The IEEE 1500 standard for embedded core testing proposes a very effective solution for testing modern system-on-chip (SoC). It proposes a flexible hardware test wrapper architecture, together with a core test language (CTL) used to describe the implemented wrapper functionalities. Already several IP providers have announced compliance in both existing and future design blocks. In this paper we address the challenge of guaranteeing the compliance of a wrapper architecture and its CTL description to the IEEE std. 1500. This is a mandatory step to fully trust the wrapper functionalities in applying the test sequences to the core. The proposed solution aims at implementing a verification framework allowing core providers and/or integrators to automatically verify the compliancy of their products (sold or purchased) to the standar

    IEEE Standard 1500 Compliance Verification for Embedded Cores

    Get PDF
    Core-based design and reuse are the two key elements for an efficient system-on-chip (SoC) development. Unfortunately, they also introduce new challenges in SoC testing, such as core test reuse and the need of a common test infrastructure working with cores originating from different vendors. The IEEE 1500 Standard for Embedded Core Testing addresses these issues by proposing a flexible hardware test wrapper architecture for embedded cores, together with a core test language (CTL) used to describe the implemented wrapper functionalities. Several intellectual property providers have already announced IEEE Standard 1500 compliance in both existing and future design blocks. In this paper, we address the problem of guaranteeing the compliance of a wrapper architecture and its CTL description to the IEEE Standard 1500. This step is mandatory to fully trust the wrapper functionalities in applying the test sequences to the core. We present a systematic methodology to build a verification framework for IEEE Standard 1500 compliant cores, allowing core providers and/or integrators to verify the compliance of their products (sold or purchased) to the standar

    On Systematic Design of Protectors for Employing OTS Items

    Get PDF
    Off-the-shelf (OTS) components are increasingly used in application areas with stringent dependability requirements. Component wrapping is a well known structuring technique used in many areas. We propose a general approach to developing protective wrappers that assist in integrating OTS items with a focus on the overall system dependability. The wrappers are viewed as redundant software used to detect errors or suspicious activity and to execute appropriate recovery when possible; wrapper development is considered as a part of system integration activities. Wrappers are to be rigorously specified and executed at run time as a means of protecting OTS items against faults in the rest of the system, and the system against the OTS item's faults. Possible symptoms of erroneous behaviour to be detected by a protective wrapper and possible actions to be undertaken in response are listed and discussed. The information required for wrapper development is provided by traceability analysis. Possible approaches to implementing “protectors” in the standard current component technologies are briefly outline

    Sample-based XPath Ranking for Web Information Extraction

    Get PDF
    Web information extraction typically relies on a wrapper, i.e., program code or a configuration that specifies how to extract some information from web pages at a specific website. Manually creating and maintaining wrappers is a cumbersome and error-prone task. It may even be prohibitive as some applications require information extraction from previously unseen websites. This paper approaches the problem of automatic on-the-fly wrapper creation for websites that provide attribute data for objects in a ‘search – search result page – detail page’ setup. The approach is a wrapper induction approach which uses a small and easily obtainable set of sample data for ranking XPaths on their suitability for extracting the wanted attribute data. Experiments show that the automatically generated top-ranked XPaths indeed extract the wanted data. Moreover, it appears that 20 to 25 input samples suffice for finding a suitable XPath for an attribute

    DOOp, an automated wrapper for DAOSPEC

    Full text link
    Large spectroscopic surveys such as the Gaia-ESO Survey produce huge quantities of data. Automatic tools are necessary to efficiently handle this material. The measurement of equivalent widths in stellar spectra is traditionally done by hand or with semi-automatic procedures that are time-consuming and not very robust with respect to the repeatability of the results. The program DAOSPEC is a tool that provides consistent measurements of equivalent widths in stellar spectra while requiring a minimum of user intervention. However, it is not optimised to deal with large batches of spectra, as some parameters still need to be modified and checked by the user. Exploiting the versatility and portability of BASH, we have built a pipeline called DAOSPEC Option Optimiser (DOOp) automating the procedure of equivalent widths measurement with DAOSPEC. DOOp is organised in different modules that run one after the other to perform specific tasks, taking care of the optimisation of the parameters needed to provide the final equivalent widths, and providing log files to ensure better control over the procedure. In this paper, making use of synthetic and observed spectra, we compare the performance of DOOp with other methods, including DAOSPEC used manually. The measurements made by DOOp are identical to the ones produced by DAOSPEC when used manually, while requiring less user intervention, which is convenient when dealing with a large quantity of spectra. DOOp shows its best performance on high-resolution spectra (R>20 000) and high signal-to-noise ratio (S/N>30), with uncertainties ranging from 6 m{\AA} to 2 m{\AA}. The only subjective parameter that remains is the normalisation, as the user still has to make a choice on the order of the polynomial used for the continuum fitting. As a test, we use the equivalent widths measured by DOOp to re-derive the stellar parameters of four well-studied stars

    Wrapper syntax for example-based machine translation

    Get PDF
    TransBooster is a wrapper technology designed to improve the performance of wide-coverage machine translation systems. Using linguistically motivated syntactic information, it automatically decomposes source language sentences into shorter and syntactically simpler chunks, and recomposes their translation to form target language sentences. This generally improves both the word order and lexical selection of the translation. To date, TransBooster has been successfully applied to rule-based MT, statistical MT, and multi-engine MT. This paper presents the application of TransBooster to Example-Based Machine Translation. In an experiment conducted on test sets extracted from Europarl and the Penn II Treebank we show that our method can raise the BLEU score up to 3.8% relative to the EBMT baseline. We also conduct a manual evaluation, showing that TransBooster-enhanced EBMT produces a better output in terms of fluency than the baseline EBMT in 55% of the cases and in terms of accuracy in 53% of the cases
    corecore