80,403 research outputs found
Wrapper Maintenance: A Machine Learning Approach
The proliferation of online information sources has led to an increased use
of wrappers for extracting data from Web sources. While most of the previous
research has focused on quick and efficient generation of wrappers, the
development of tools for wrapper maintenance has received less attention. This
is an important research problem because Web sources often change in ways that
prevent the wrappers from extracting data correctly. We present an efficient
algorithm that learns structural information about data from positive examples
alone. We describe how this information can be used for two wrapper maintenance
applications: wrapper verification and reinduction. The wrapper verification
system detects when a wrapper is not extracting correct data, usually because
the Web source has changed its format. The reinduction algorithm automatically
recovers from changes in the Web source by identifying data on Web pages so
that a new wrapper may be generated for this source. To validate our approach,
we monitored 27 wrappers over a period of a year. The verification algorithm
correctly discovered 35 of the 37 wrapper changes, and made 16 mistakes,
resulting in precision of 0.73 and recall of 0.95. We validated the reinduction
algorithm on ten Web sources. We were able to successfully reinduce the
wrappers, obtaining precision and recall values of 0.90 and 0.80 on the data
extraction task
Automating the IEEE std. 1500 compliance verification for embedded cores
The IEEE 1500 standard for embedded core testing proposes a very effective solution for testing modern system-on-chip (SoC). It proposes a flexible hardware test wrapper architecture, together with a core test language (CTL) used to describe the implemented wrapper functionalities. Already several IP providers have announced compliance in both existing and future design blocks. In this paper we address the challenge of guaranteeing the compliance of a wrapper architecture and its CTL description to the IEEE std. 1500. This is a mandatory step to fully trust the wrapper functionalities in applying the test sequences to the core. The proposed solution aims at implementing a verification framework allowing core providers and/or integrators to automatically verify the compliancy of their products (sold or purchased) to the standar
IEEE Standard 1500 Compliance Verification for Embedded Cores
Core-based design and reuse are the two key elements for an efficient system-on-chip (SoC) development. Unfortunately, they also introduce new challenges in SoC testing, such as core test reuse and the need of a common test infrastructure working with cores originating from different vendors. The IEEE 1500 Standard for Embedded Core Testing addresses these issues by proposing a flexible hardware test wrapper architecture for embedded cores, together with a core test language (CTL) used to describe the implemented wrapper functionalities. Several intellectual property providers have already announced IEEE Standard 1500 compliance in both existing and future design blocks. In this paper, we address the problem of guaranteeing the compliance of a wrapper architecture and its CTL description to the IEEE Standard 1500. This step is mandatory to fully trust the wrapper functionalities in applying the test sequences to the core. We present a systematic methodology to build a verification framework for IEEE Standard 1500 compliant cores, allowing core providers and/or integrators to verify the compliance of their products (sold or purchased) to the standar
On Systematic Design of Protectors for Employing OTS Items
Off-the-shelf (OTS) components are increasingly used in application areas with stringent dependability requirements. Component wrapping is a well known structuring technique used in many areas. We propose a general approach to developing protective wrappers that assist in integrating OTS items with a focus on the overall system dependability. The wrappers are viewed as redundant software used to detect errors or suspicious activity and to execute appropriate recovery when possible; wrapper development is considered as a part of system integration activities. Wrappers are to be rigorously specified and executed at run time as a means of protecting OTS items against faults in the rest of the system, and the system against the OTS item's faults. Possible symptoms of erroneous behaviour to be detected by a protective wrapper and possible actions to be undertaken in response are listed and discussed. The information required for wrapper development is provided by traceability analysis. Possible approaches to implementing “protectors” in the standard current component technologies are briefly outline
Sample-based XPath Ranking for Web Information Extraction
Web information extraction typically relies on a wrapper, i.e., program code or a configuration that specifies how to extract some information from web pages at a specific website. Manually creating and maintaining wrappers is a cumbersome and error-prone task. It may even be prohibitive as some applications require information extraction from previously unseen websites. This paper approaches the problem of automatic on-the-fly wrapper creation for websites that provide attribute data for objects in a ‘search – search result page – detail page’ setup. The approach is a wrapper induction approach which uses a small and easily obtainable set of sample data for ranking XPaths on their suitability for extracting the wanted attribute data. Experiments show that the automatically generated top-ranked XPaths indeed extract the wanted data. Moreover, it appears that 20 to 25 input samples suffice for finding a suitable XPath for an attribute
DOOp, an automated wrapper for DAOSPEC
Large spectroscopic surveys such as the Gaia-ESO Survey produce huge
quantities of data. Automatic tools are necessary to efficiently handle this
material. The measurement of equivalent widths in stellar spectra is
traditionally done by hand or with semi-automatic procedures that are
time-consuming and not very robust with respect to the repeatability of the
results. The program DAOSPEC is a tool that provides consistent measurements of
equivalent widths in stellar spectra while requiring a minimum of user
intervention. However, it is not optimised to deal with large batches of
spectra, as some parameters still need to be modified and checked by the user.
Exploiting the versatility and portability of BASH, we have built a pipeline
called DAOSPEC Option Optimiser (DOOp) automating the procedure of equivalent
widths measurement with DAOSPEC. DOOp is organised in different modules that
run one after the other to perform specific tasks, taking care of the
optimisation of the parameters needed to provide the final equivalent widths,
and providing log files to ensure better control over the procedure. In this
paper, making use of synthetic and observed spectra, we compare the performance
of DOOp with other methods, including DAOSPEC used manually. The measurements
made by DOOp are identical to the ones produced by DAOSPEC when used manually,
while requiring less user intervention, which is convenient when dealing with a
large quantity of spectra. DOOp shows its best performance on high-resolution
spectra (R>20 000) and high signal-to-noise ratio (S/N>30), with uncertainties
ranging from 6 m{\AA} to 2 m{\AA}. The only subjective parameter that remains
is the normalisation, as the user still has to make a choice on the order of
the polynomial used for the continuum fitting. As a test, we use the equivalent
widths measured by DOOp to re-derive the stellar parameters of four
well-studied stars
Wrapper syntax for example-based machine translation
TransBooster is a wrapper technology designed to improve the performance of wide-coverage machine translation
systems. Using linguistically motivated syntactic information, it automatically decomposes source language sentences into shorter and syntactically simpler chunks, and recomposes their translation to form target language sentences. This generally improves both the word order
and lexical selection of the translation. To date, TransBooster has been successfully applied to rule-based MT, statistical MT, and multi-engine MT. This paper presents
the application of TransBooster to Example-Based Machine Translation. In an experiment conducted on test sets
extracted from Europarl and the Penn II Treebank we show that our method can raise the BLEU score up to 3.8% relative
to the EBMT baseline. We also conduct a manual evaluation, showing that TransBooster-enhanced EBMT produces
a better output in terms of fluency than the baseline EBMT in 55% of the cases and in terms of accuracy in 53% of the
cases
- …