4 research outputs found

    Integrating Deep-Web Information Sources

    Get PDF
    Deep-web information sources are difficult to integrate into automated business processes if they only provide a search form. A wrapping agent is a piece of software that allows a developer to query such information sources without worrying about the details of interacting with such forms. Our goal is to help soft ware engineers construct wrapping agents that interpret queries written in high-level structured languages. We think that this shall definitely help reduce integration costs because this shall relieve developers from the burden of transforming their queries into low-level interactions in an ad-hoc manner. In this paper, we report on our reference framework, delve into the related work, and highlight current research challenges. This is intended to help guide future research efforts in this area.Ministerio de Educaci贸n y Ciencia TIN2007-64119Junta de Andaluc铆a P07-TIC-2602Junta de Andaluc铆a P08-TIC-4100Ministerio de Ciencia e Innovaci贸n TIN2008-04718-

    Mejorando las t茅cnicas de verificaci贸n de wrappers web mediante t茅cnicas bioinspiradas y de clasificaci贸n

    Get PDF
    Muchas Aplicaciones Empresariales necesitan de los wrappers para poder tratar con informaci贸n proveniente de la web profunda. Los wrappers son sistemas autom谩ticos que permiten navegar, extraer, estructurar y verificar informaci贸n relevante proveniente de la web. Uno de sus elementos, el extractor de informaci贸n, est谩 formado por una serie de reglas de extracci贸n que suelen estar basadas en etiquetas HTML. Por tanto, si las fuentes cambian, el wrapper, en algunos casos, puede devolver informaci贸n no deseada por la empresa y provocar, en el mejor de los casos, retrasos en sus tomas de decisi贸n. Diversos sistemas de verificaci贸n de wrappers se han desarrollado con el objetivo de detectar autom谩ticamente cu谩ndo un wrapper est谩 extrayendo datos incorrectos. Estos sistemas presentan una serie de carencias cuyo origen radica en asumir que los datos a verificar siguen una serie de caracter铆sticas estad铆sticas preestablecidas. En esta disertaci贸n se analizan estos sistemas, se dise帽a un marco de trabajo para desarrollar verificadores y se aborda el problema de la verificaci贸n desde dos puntos de vista distintos. Inicialmente lo ubicaremos dentro de la rama de la optimizaci贸n computacional y lo resolveremos aplicando metahe煤risticas bioinspiradas como es la basada en colonias en hormigas, en concreto aplicaremos el algoritmo BWAS; con posterioridad, lo formularemos y resolveremos como si de un problema de clasificaci贸n no supervisada se tratara. Fruto de este segundo enfoque surge MAVE, un verificador multinivel cuya base principal son los clasificadores de una 煤nica clase.Many Enterprise Applications require wrappers to deal with information from the deep web. Wrappers are automated systems that allow you to navigate, extract, reveal structures and verify information from the web. One of its elements, the information extractor, is formed by extraction rules series that are usually based on HTML tags. Therefore, if you change sources, the wrapper, in some cases, may return unwanted information by the company and cause, at the best, delays in their decision-making process. Some wrappers verification systems have been developed to automatically detect when a wrapper is taking out incorrect data. These systems have a number of shortcomings whose origin lies in assuming that the data to verify follow a series of pre statistics. This dissertation analyzes these systems, a framework is designed to develop verifiers and the verification problem is approached from two different points of view. Initially, we place it within the branch of computational optimization and solve it applying bio-inspired metaheuristic as it is found in ant colonies, specifically we will apply the BWAS algorithm. Subsequently we will formulate and solve as if it were a unsupervised classification problem. The result of this second approach is MAVE, a multilevel verifier whose main base are the unique class classifiers
    corecore